Skip to content

Support AWS compressed storage uploads#3113

Draft
dobrac wants to merge 3 commits into
mainfrom
dobrac/aws-gcs-compression
Draft

Support AWS compressed storage uploads#3113
dobrac wants to merge 3 commits into
mainfrom
dobrac/aws-gcs-compression

Conversation

@dobrac

@dobrac dobrac commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Adds AWS support for compressed storage uploads using the shared compression pipeline and S3 multipart upload APIs. Renames the upload limiter API to storage-neutral names while preserving existing LaunchDarkly flag keys. Threads the existing limiter into AWS storage so AWS and GCP compressed uploads share the same concurrency controls. Adds shared part uploader contract tests for GCP and AWS.

@cla-bot cla-bot Bot added the cla-signed label Jun 27, 2026
@cursor

cursor Bot commented Jun 27, 2026

Copy link
Copy Markdown

PR Summary

Medium Risk
Changes snapshot/build upload paths and S3 multipart behavior (compression, concurrency, abort-on-close); misconfiguration or multipart bugs could affect large object durability or upload throughput.

Overview
Adds compressed snapshot/template uploads on AWS S3 by wiring StoreFile through the same compressStream pipeline as GCP, backed by a new awsPartUploader (multipart initiate/part/complete/abort).

Upload throttling is renamed to storage-neutral APIs and flags (StorageUploadLimiter, StorageMaxUploadTasks, etc.) while keeping existing LaunchDarkly keys; GCP call sites are updated accordingly. The limiter is passed into AWS storage so compressed uploads share the semaphore and max-task settings with GCP. Plain S3 multipart uploads now take concurrency from the limiter instead of a fixed value.

Shared contract tests exercise the partUploader behavior for both GCP and AWS (including abort-on-close for AWS).

Reviewed by Cursor Bugbot for commit 9f9b81a. Bugbot is set up for automated code reviews on this repo. Configure here.

@codecov

codecov Bot commented Jun 27, 2026

Copy link
Copy Markdown

bucketName: o.bucketName,
objectName: o.path,
metadata: putOpts.Metadata,
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing uncompressed size metadata

Medium Severity

AWS compressed uploads pass only caller metadata into CreateMultipartUpload, unlike GCP which also sets uncompressed-size from the source file. awsObject.Size() uses S3 ContentLength, so compressed objects report compressed bytes instead of logical uncompressed size.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit a1a8100. Configure here.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The awsPartUploader is never closed in storeFileCompressed, which can cause unfinished multipart uploads to leak on AWS S3 if the compression or upload stream fails. Additionally, access to m.completed is not synchronized, leading to potential data races if Close is called concurrently with or shortly after Complete.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +269 to +274
uploader := &awsPartUploader{
client: o.client,
bucketName: o.bucketName,
objectName: o.path,
metadata: putOpts.Metadata,
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The awsPartUploader is never closed in storeFileCompressed, which causes unfinished multipart uploads to leak on AWS S3 if the compression or upload stream fails. Deferring uploader.Close() ensures that any incomplete multipart upload is properly aborted and its associated storage resources are cleaned up.

Suggested change
uploader := &awsPartUploader{
client: o.client,
bucketName: o.bucketName,
objectName: o.path,
metadata: putOpts.Metadata,
}
uploader := &awsPartUploader{
client: o.client,
bucketName: o.bucketName,
objectName: o.path,
metadata: putOpts.Metadata,
}
defer uploader.Close()
References
  1. Focus on resource leaks as specified in the repository style guide. (link)

Comment on lines +468 to +471

m.completed = true

return nil

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Access to m.completed is not synchronized, which can lead to a data race if Close is called concurrently with or shortly after Complete (for example, during context cancellation or early termination of the upload pipeline). Protecting m.completed with m.mu in both Complete and Close prevents this race condition.

	m.mu.Lock()
	m.completed = true
	m.mu.Unlock()

	return nil
References
  1. Focus on race conditions as specified in the repository style guide. (link)

Comment on lines +474 to +477
func (m *awsPartUploader) Close() error {
if m.completed || m.uploadID == "" {
return nil
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Access to m.completed is not synchronized, which can lead to a data race if Close is called concurrently with or shortly after Complete (for example, during context cancellation or early termination of the upload pipeline). Protecting m.completed with m.mu in both Complete and Close prevents this race condition.

Suggested change
func (m *awsPartUploader) Close() error {
if m.completed || m.uploadID == "" {
return nil
}
func (m *awsPartUploader) Close() error {
m.mu.Lock()
completed := m.completed
m.mu.Unlock()
if completed || m.uploadID == "" {
return nil
}
References
  1. Focus on race conditions as specified in the repository style guide. (link)

"github.com/e2b-dev/infra/packages/shared/pkg/utils"
)

func (l *Limiter) StorageUploadLimiter() *utils.AdjustableSemaphore {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just moved

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 0851835. Configure here.

u.PartSize = 10 * 1024 * 1024 // 10 MB
u.Concurrency = 8 // eight parts in flight
u.PartSize = awsMultipartUploadPartSize
u.Concurrency = o.maxUploadConcurrency(ctx)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-compressed AWS skips upload semaphore

Medium Severity

Uncompressed AWS StoreFile uses StorageMaxUploadTasks for SDK concurrency but never acquires StorageUploadLimiter, while GCP StoreFile holds that semaphore for every upload when a limiter is configured.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 0851835. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant