Support SQL GROUPING SETS / ROLLUP / CUBE on both query engines#18664
Open
xiangfu0 wants to merge 1 commit into
Open
Support SQL GROUPING SETS / ROLLUP / CUBE on both query engines#18664xiangfu0 wants to merge 1 commit into
xiangfu0 wants to merge 1 commit into
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #18664 +/- ##
============================================
+ Coverage 56.88% 64.49% +7.60%
- Complexity 7 1291 +1284
============================================
Files 2582 3371 +789
Lines 149911 208775 +58864
Branches 24234 32652 +8418
============================================
+ Hits 85283 134653 +49370
- Misses 57415 63294 +5879
- Partials 7213 10828 +3615
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Add GROUPING SETS / ROLLUP / CUBE / GROUPING() / GROUPING_ID() support. Single-stage engine (SSE) executes natively in a single scan: a new optional PinotQuery.groupingSetsMasks Thrift field carries the per-set participation masks; CalciteSqlParser normalizes the constructs into the union of group-by columns plus masks and rewrites GROUPING()/GROUPING_ID() onto a synthetic $groupingId key column; GroupingSetsGroupKeyGenerator maps one row to one group per set; and the combine/reduce path is migrated to N+1 key columns with the NULL-bitmap path forced for rolled-up columns (so they serialize as NULL regardless of null-handling mode). Multi-valued group-by columns are rejected. Multi-stage engine (MSE) expands a grouping-set aggregate into a UNION ALL of ordinary per-set aggregates (GROUPING values become per-branch literals), so the multi-stage runtime executes only standard Union / Aggregate / Project plans. Validated with unit tests, in-process server+broker tests, MSE planner tests, and a cluster integration test passing grouping-set queries on both engines. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
61c2096 to
94aab3e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds SQL
GROUPING SETS/ROLLUP/CUBEand theGROUPING()/GROUPING_ID()indicator functions, working end-to-end on both query engines.Single-stage engine (SSE) — native, single scan
PinotQuery.groupingSetsMaskscarries the per-set participation masks over the union of group-by columns.CalciteSqlParsernormalizesROLLUP/CUBE/GROUPING SETS(including mixed forms likea, ROLLUP(b, c)) into the union columns + masks, and rewritesGROUPING(col)/GROUPING_ID(...)onto a synthetic internal$groupingIdkey column.GroupingSetsGroupKeyGeneratormaps each row to one group per set (via the existing multi-valued group-by path), with$groupingIdas an extra key column that keeps a rolled-up NULL from colliding with a real-data NULL.NULLregardless of null-handling mode. Star-tree is disabled for these queries.Multi-stage engine (MSE) — UNION ALL expansion
LogicalAggregateis expanded into aUNION ALLof ordinary per-set aggregates (GroupingSetsExpander), with rolled-up columns projected asNULLandGROUPING()/GROUPING_ID()computed as per-branch constant literals. The multi-stage runtime therefore executes only standardUnion/Aggregate/Projectplans and needs no runtime changes.GROUPING/GROUPING_IDregistered inPinotOperatorTable; ROW-expression validation relaxed for the parenthesized grouping lists.Example
returns the detail rows, per-country subtotals (
city= NULL,GROUPING(city)= 1), and the grand total (country/city= NULL,GROUPING= 1).Testing
GROUPINGrewrite, Thrift wire round-trip.GroupingSetsQueriesTest), covering ROLLUP/CUBE/GROUPING SETS,GROUPING/GROUPING_ID, HAVING, both ORDER BY paths, the empty-server schema, and the multi-valued-column rejection.GroupingSetsPlannerTest) + 191 existing planner tests (no regression).GroupingSetsTest) running every query on both engines.Limitations / follow-ups
groupingSetsMasksThrift field isoptionaland wire-compatible, but grouping-set queries require all servers to be upgraded: an un-upgraded server would ignore the field and run a plain GROUP BY. This is a new query capability (no existing-query regression) and should be noted in release notes; a broker-side min-version gate is a sensible follow-up.🤖 Generated with Claude Code