Skip to content

Support SQL GROUPING SETS / ROLLUP / CUBE on both query engines#18664

Open
xiangfu0 wants to merge 1 commit into
apache:masterfrom
xiangfu0:claude/modest-wright-82ac58
Open

Support SQL GROUPING SETS / ROLLUP / CUBE on both query engines#18664
xiangfu0 wants to merge 1 commit into
apache:masterfrom
xiangfu0:claude/modest-wright-82ac58

Conversation

@xiangfu0
Copy link
Copy Markdown
Contributor

@xiangfu0 xiangfu0 commented Jun 3, 2026

Summary

Adds SQL GROUPING SETS / ROLLUP / CUBE and the GROUPING() / GROUPING_ID() indicator functions, working end-to-end on both query engines.

Single-stage engine (SSE) — native, single scan

  • New optional Thrift field PinotQuery.groupingSetsMasks carries the per-set participation masks over the union of group-by columns.
  • CalciteSqlParser normalizes ROLLUP / CUBE / GROUPING SETS (including mixed forms like a, ROLLUP(b, c)) into the union columns + masks, and rewrites GROUPING(col) / GROUPING_ID(...) onto a synthetic internal $groupingId key column.
  • GroupingSetsGroupKeyGenerator maps each row to one group per set (via the existing multi-valued group-by path), with $groupingId as an extra key column that keeps a rolled-up NULL from colliding with a real-data NULL.
  • The combine/reduce path is migrated from N to N+1 key columns, and the NULL-bitmap serialization path is forced on for rolled-up columns so they round-trip as NULL regardless of null-handling mode. Star-tree is disabled for these queries.

Multi-stage engine (MSE) — UNION ALL expansion

  • A grouping-set LogicalAggregate is expanded into a UNION ALL of ordinary per-set aggregates (GroupingSetsExpander), with rolled-up columns projected as NULL and GROUPING() / GROUPING_ID() computed as per-branch constant literals. The multi-stage runtime therefore executes only standard Union / Aggregate / Project plans and needs no runtime changes.
  • GROUPING / GROUPING_ID registered in PinotOperatorTable; ROW-expression validation relaxed for the parenthesized grouping lists.

Example

SELECT country, city, SUM(sales), GROUPING(country), GROUPING(city)
FROM   sales
GROUP  BY ROLLUP(country, city)

returns the detail rows, per-country subtotals (city = NULL, GROUPING(city) = 1), and the grand total (country/city = NULL, GROUPING = 1).

Testing

  • Unit tests: bit conventions, scalar functions, parser normalization + GROUPING rewrite, Thrift wire round-trip.
  • In-process server-execution + broker-reduce tests (GroupingSetsQueriesTest), covering ROLLUP/CUBE/GROUPING SETS, GROUPING/GROUPING_ID, HAVING, both ORDER BY paths, the empty-server schema, and the multi-valued-column rejection.
  • MSE planner tests (GroupingSetsPlannerTest) + 191 existing planner tests (no regression).
  • Cluster integration test (GroupingSetsTest) running every query on both engines.

Limitations / follow-ups

  • Multi-valued group-by columns inside grouping sets are rejected with a clear error (single-valued only for now).
  • The new groupingSetsMasks Thrift field is optional and wire-compatible, but grouping-set queries require all servers to be upgraded: an un-upgraded server would ignore the field and run a plain GROUP BY. This is a new query capability (no existing-query regression) and should be noted in release notes; a broker-side min-version gate is a sensible follow-up.

🤖 Generated with Claude Code

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jun 3, 2026

Codecov Report

❌ Patch coverage is 88.33333% with 49 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.49%. Comparing base (5dee072) to head (94aab3e).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
...egation/groupby/GroupingSetsGroupKeyGenerator.java 72.80% 21 Missing and 10 partials ⚠️
...org/apache/pinot/sql/parsers/CalciteSqlParser.java 89.92% 4 Missing and 9 partials ⚠️
...apache/pinot/calcite/rel/GroupingSetsExpander.java 95.08% 1 Missing and 2 partials ⚠️
...operator/combine/SortedGroupByCombineOperator.java 50.00% 0 Missing and 1 partial ⚠️
...pinot/core/query/request/context/QueryContext.java 88.88% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18664      +/-   ##
============================================
+ Coverage     56.88%   64.49%   +7.60%     
- Complexity        7     1291    +1284     
============================================
  Files          2582     3371     +789     
  Lines        149911   208775   +58864     
  Branches      24234    32652    +8418     
============================================
+ Hits          85283   134653   +49370     
- Misses        57415    63294    +5879     
- Partials       7213    10828    +3615     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-21 64.49% <88.33%> (+7.60%) ⬆️
temurin 64.49% <88.33%> (+7.60%) ⬆️
unittests 64.49% <88.33%> (+7.60%) ⬆️
unittests1 56.95% <88.33%> (+0.06%) ⬆️
unittests2 37.07% <12.85%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Add GROUPING SETS / ROLLUP / CUBE / GROUPING() / GROUPING_ID() support.

Single-stage engine (SSE) executes natively in a single scan: a new optional
PinotQuery.groupingSetsMasks Thrift field carries the per-set participation
masks; CalciteSqlParser normalizes the constructs into the union of group-by
columns plus masks and rewrites GROUPING()/GROUPING_ID() onto a synthetic
$groupingId key column; GroupingSetsGroupKeyGenerator maps one row to one group
per set; and the combine/reduce path is migrated to N+1 key columns with the
NULL-bitmap path forced for rolled-up columns (so they serialize as NULL
regardless of null-handling mode). Multi-valued group-by columns are rejected.

Multi-stage engine (MSE) expands a grouping-set aggregate into a UNION ALL of
ordinary per-set aggregates (GROUPING values become per-branch literals), so the
multi-stage runtime executes only standard Union / Aggregate / Project plans.

Validated with unit tests, in-process server+broker tests, MSE planner tests,
and a cluster integration test passing grouping-set queries on both engines.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@xiangfu0 xiangfu0 force-pushed the claude/modest-wright-82ac58 branch from 61c2096 to 94aab3e Compare June 3, 2026 10:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants