Skip to content

Fix aggregation flow with remote initiator#1872

Open
ianton-ru wants to merge 2 commits into
antalya-26.3from
bugfix/antalya-26.3/fix_aggregation_with_remote_initiator
Open

Fix aggregation flow with remote initiator#1872
ianton-ru wants to merge 2 commits into
antalya-26.3from
bugfix/antalya-26.3/fix_aggregation_with_remote_initiator

Conversation

@ianton-ru
Copy link
Copy Markdown

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Fix aggregation flow with remote initiator

Documentation entry for user-facing changes

With object_storage_remote_initiator but without object_storage_cluster setting StorageObjectStorageCluster::getQueryProcessingStage returned QueryProcessingStage::Enum::FetchColumns, as result nodes sent all rows on initiator and aggregation executed on initiator.
Now method returns QueryProcessingStage::Enum::WithMergeableState is proper cases, and pre-aggregation executed on nodes.

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • S3 Export (2h)
  • Swarms (30m)
  • Tiered Storage (2h)

@ianton-ru
Copy link
Copy Markdown
Author

@codex review

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 4, 2026

Workflow [PR], commit [e72ebfe]

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e72ebfe177

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +725 to +727
if (getClusterName(context).empty() // Not cluster request
&& !(context->getSettingsRef()[Setting::object_storage_remote_initiator] // Not request with remote initiator
&& !context->getSettingsRef()[Setting::object_storage_remote_initiator_cluster].value.empty()))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep unsupported storages at FetchColumns

When object_storage_remote_initiator is set for a configuration that does not support cluster mode, such as icebergLocal or other local data-lake storages where getClusterName is empty because isClusterSupported is false, this new exception makes getQueryProcessingStage report WithMergeableState. IStorageCluster::read still immediately falls back to pure_storage for those configurations, and StorageObjectStorage::read ignores processed_stage, so aggregate queries can be planned as if partial aggregation happened remotely even though only raw rows were read. Please only return the distributed stage when the subsequent read path will actually use the remote/cluster execution path.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant