Add Yezzey submodule to gpcontrib#1752
Conversation
Yezzey is an open-source extension for Apache Cloudberry and Greenplum 6 that transparently offloads Append-Only (AO/AOCO) table data to S3-compatible object storage. Inspired by Snowflake and AnyBlob, it extends the storage manager (smgr) so reads and writes go to S3 instead of local disk, keeping the user interface unchanged. A companion YProxy service acts as an I/O scheduler, managing connection pooling and request prioritization to prevent S3 throttling. Data is PGP-encrypted during upload. Benchmarks show only 10–43% query slowdown versus local storage, far outperforming PXF, making it ideal for cost-effective cold-data tiering. The main feature of Yezzey - you don't need to change tables or code; just launch yezzey_define_offload_policy and move the data to S3. In this way, you can unload your cluster using the available disk space. Currently, it is widely used on Greenplum 6 instances, and the goal is to provide users with the same interface in Cloudberry, so they can seamlessly migrate to Cloudberry. We placed Yezzey as a submodule, as we believe one day we will replace all outdated solutions like AO/AOCO/Yezzey with PAX. However, that has not happened yet and we still need Yezzey.
|
It's the part of our roadmap, we discussed it #868 See the item |
|
We also need to add the
|
|
For managing the new submodule, we can introduce it in the same way discussed here: #1084 (review) |
fixed |
Yes, I did as was described. The issue is that there is no stored tag information and the tag is an ephemeral entity shown only in the Here yezzey linked with |
|
Lgtm |
Hi, you're right. It looks good now. Here is my cmd list: Never mind. We can add the version number info in the commit history when upgrading it next time. |
| fi | ||
|
|
||
| # | ||
| # yezzey |
There was a problem hiding this comment.
Hi, does yezzey need some extra dependencies for building? If so, we also need to add the pre-check for them when running configure --with-yezzey.
| recurse_targets += gp_stats_collector | ||
| endif | ||
| ifeq "$(with_yezzey)" "yes" | ||
| recurse_targets += yezzey |
There was a problem hiding this comment.
| recurse_targets += yezzey | |
| recurse_targets += yezzey |
| services: | ||
| # Define the MinIO service container | ||
| minio: | ||
| image: lazybit/minio # Use a specific MinIO image tag |
There was a problem hiding this comment.
Do we need to specify a tag for this image?
|
|
||
| - name: Install MinIO Client (mc) | ||
| run: | | ||
| set -ex pipefail |
There was a problem hiding this comment.
| set -ex pipefail | |
| set -exo pipefail |
| name: Build and Test Yezzey Cloudberry | ||
| runs-on: ubuntu-latest | ||
| container: | ||
| image: apache/incubator-cloudberry:cbdb-build-ubuntu22.04-latest |
There was a problem hiding this comment.
Could we add the Rocky 8/9 docker image to the test matrix?
Yezzey is an open-source extension for Apache Cloudberry and Greenplum 6 that transparently offloads Append-Only (AO/AOCO) table data to S3-compatible object storage. Inspired by Snowflake and AnyBlob, it extends the storage manager (smgr) so reads and writes go to S3 instead of local disk, keeping the user interface unchanged. A companion YProxy service acts as an I/O scheduler, managing connection pooling and request prioritization to prevent S3 throttling. Data is PGP-encrypted during upload. Benchmarks show only 10–43% query slowdown versus local storage, far outperforming PXF, making it ideal for cost-effective cold-data tiering.
The main feature of Yezzey - you don't need to change tables or code; just launch yezzey_define_offload_policy and move the data to S3. In this way, you can unload your cluster using the available disk space.
Currently, it is widely used on Greenplum 6 instances, and the goal is to provide users with the same interface in Cloudberry, so they can seamlessly migrate to Cloudberry.
We placed Yezzey as a submodule, as we believe one day we will replace all outdated solutions like AO/AOCO/Yezzey with PAX. However, that has not happened yet and we still need Yezzey.
Fixes #ISSUE_Number
What does this PR do?
Type of Change
Breaking Changes
Test Plan
make installcheckmake -C src/test installcheck-cbdb-parallelImpact
Performance:
User-facing changes:
Dependencies:
Checklist
Additional Context
CI Skip Instructions