OPEN_STRUCT storage layer — columnar two-tier dense/sparse index (PR 2/4) by tarun11Mavani · Pull Request #18643 · apache/pinot

tarun11Mavani · 2026-06-01T05:24:23Z

Summary

This is the second PR in a 4-part stack introducing DataType.OPEN_STRUCT. It adds the segment storage layer — the runtime that makes OPEN_STRUCT columns ingestible, sealable, and queryable. Without it, declaring dataType: OPEN_STRUCT in a schema does nothing.

RFC: https://docs.google.com/document/d/14kPmjDTKbO8l0ql4rrN7I5Yki5pqMw6GeGmxxc9grsU/edit?tab=t.0

PR 1 (SPI + data model): #18368

Architecture

OPEN_STRUCT columns use a two-tier columnar storage model:

Dense tier — keys with fill rate ≥ denseKeyMinFillRate (default 0.5) or explicitly listed in denseKeys are materialized as standard Pinot columns (<column>$<key>), each with its own forward index, dictionary, null vector, and optional per-key indexes (inverted, range, bloom).
Sparse tier — remaining keys are packed into a single JSON column (<column>$__sparse__).

Dense keys reuse the entire standard column infrastructure — no bespoke binary format. New index types (range, text, JSON) work on dense keys with zero custom code, and segment-level column pruning applies automatically.

Mutable (consuming) path

MutableOpenStructIndex — per-key in-memory state during consumption. Implements both MutableIndex and OpenStructIndexReader (dual-role: write-side index + read-side reader for queries against mutable segments). All observed keys are retained during consumption; dense/sparse classification is deferred to seal time where fill rates are known.
MutableKeyColumn — per-key forward index + presence bitmap + type tracking. Three-level type resolution: declared child FieldSpec → value-based inference via OpenStructTypeInference.
MutableOpenStructDataSource — exposes per-key DataSources to the query layer via the OpenStructDataSource SPI. Supports getMapValue(docId) for row-major map reconstruction during seal.

Seal / offline path

OpenStructColumnSplitter (~550 LOC) — classifies keys into dense vs sparse by fill rate (plus explicit denseKeys and maxDenseKeys), writes each dense key as a standard materialized column via the standard index creator pipeline (ForwardIndexCreator, DictionaryBasedInvertedIndexCreator, etc.), and packs the rest into a single sparse JSON column. Emits per-child and parent column metadata. Dense-key indexes are driven by a generic creator loop over IndexService.getAllIndexes() filtered to a vetted allowlist (inverted, range, bloom), with each index type's validate() run at build time against the resolved child FieldSpec.
FieldIndexConfigsUtil.fromFieldConfig — new additive method that builds FieldIndexConfigs from a single FieldConfig without requiring a TableConfig/Schema, enabling per-key index resolution for synthetic materialized children.
BaseSegmentCreator — merges the splitter's per-child metadata into the segment properties so each child loads as its own column. Registers splitter child columns in the DIMENSIONS property so the V3 converter discovers their index files.
Stats collector reuse — dense-key statistics (cardinality, min/max, distinct values, element lengths) are computed via the standard AbstractColumnStatisticsCollector family rather than hand-rolled tracking. Added a FieldSpec-based constructor to the base collector + 7 scalar collectors so they can be used without a StatsCollectorConfig/TableConfig.

Immutable (sealed) path

ImmutableSegmentImpl — single-pass post-load grouping. Scans column metadata for parentColumn, groups child columns by parent, and builds one ImmutableOpenStructDataSource per parent OPEN_STRUCT field. Child columns are removed from _dataSourceMap and only accessible via the parent's getDataSource(key).
ImmutableSegmentLoader — exempts materialized child columns from schema-vs-segment reconciliation (they exist in the segment but not in the user-facing schema).

Ingestion pipeline wiring

PinotDataType.getPinotDataTypeForIngestion — OPEN_STRUCT case returns MAP so the Map value passes through the transform pipeline unchanged.
StatsCollectorUtil.createStatsCollector — OPEN_STRUCT case uses MapColumnPreIndexStatsCollector; excluded from NoDictCollector optimization.
ForwardIndexType.shouldCreateIndex — returns false for OPEN_STRUCT parent columns (no forward index on the parent; data lives in materialized children).
PinotSegmentRecordReader — detects OPEN_STRUCT parents (no forward index), tracks their OpenStructDataSource, and reconstructs per-doc Map<String, Object> via getMapValue() for the row-major seal path.
SegmentColumnarIndexCreator — column-major build path reads from OpenStructDataSource.getMapValue() instead of attempting a PinotSegmentColumnReader on the parent.
IndexLoadingConfig.addOpenStructChildConfigs — resolves per-key FieldConfig from the parent's OpenStructIndexConfig and injects it into the config map before SegmentPreProcessor handlers run, preventing InvertedIndexHandler from stripping per-key inverted indexes.

Wiring / validation

OpenStructIndexType + OpenStructIndexPlugin — register the index via @AutoService(IndexPlugin.class). Reader factory is a no-op (children load as standard columns). Index handler returns needUpdateIndices() = false to prevent data corruption from handler-path index updates on OPEN_STRUCT parent columns.
OpenStructSupportedIndexes — vetted allowlist of per-key index types (inverted, range, bloom) with table-config-time validation.
TableConfigUtils — rejects user columns containing the reserved $ separator when any field is OPEN_STRUCT.
RealtimeSegmentStatsContainer — returns EmptyColumnStatistics for the OPEN_STRUCT parent.

`pinot-spi` changes

OpenStructTypeInference — value→DataType inference extracted from OpenStructNaming into its own single-responsibility class.
OpenStructNaming — added isMaterializedChildName and parentColumnName parser helpers for name-based child column detection.
V1Constants — HAS_SPARSE_COLUMN metadata key for parent column metadata.

Backward compatibility

Existing DataType.MAP segments and schemas are untouched. Zero behavior change for MAP.
OPEN_STRUCT is opt-in at schema authoring time. Clusters with no OPEN_STRUCT schemas see no behavior change.
The $ column name restriction applies only to schemas containing at least one OPEN_STRUCT field.

Follow-up PRs

PR 3 — Query engine (pinot-core): OpenStructFilterOperator, ItemTransformFunction typing path, AggregationPlanNode, GroupByPlanNode.
PR 4 — Integration tests and end-to-end validation.

Test plan

codecov-commenter · 2026-06-01T07:44:07Z

Codecov Report

❌ Patch coverage is 69.13265% with 242 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.49%. Comparing base (30b3b31) to head (6ccb988).
⚠️ Report is 28 commits behind head on master.

Files with missing lines	Patch %	Lines
...gment/index/openstruct/MutableOpenStructIndex.java	61.70%	28 Missing and 8 partials ⚠️
...ator/impl/openstruct/OpenStructColumnSplitter.java	89.29%	14 Missing and 15 partials ⚠️
...local/segment/index/loader/IndexLoadingConfig.java	17.24%	22 Missing and 2 partials ⚠️
...local/indexsegment/mutable/MutableSegmentImpl.java	13.04%	15 Missing and 5 partials ⚠️
...ment/creator/impl/SegmentColumnarIndexCreator.java	4.76%	19 Missing and 1 partial ⚠️
...l/indexsegment/immutable/ImmutableSegmentImpl.java	25.00%	16 Missing and 2 partials ⚠️
.../index/openstruct/MutableOpenStructDataSource.java	56.09%	16 Missing and 2 partials ⚠️
...cal/segment/index/openstruct/MutableKeyColumn.java	60.00%	16 Missing ⚠️
...ocal/segment/readers/PinotSegmentRecordReader.java	45.00%	8 Missing and 3 partials ⚠️
.../segment/index/openstruct/OpenStructIndexType.java	76.74%	6 Missing and 4 partials ⚠️
... and 13 more

Additional details and impacted files

@@             Coverage Diff              @@
##             master   #18643      +/-   ##
============================================
+ Coverage     64.41%   64.49%   +0.08%     
- Complexity     1282     1291       +9     
============================================
  Files          3362     3381      +19     
  Lines        207907   209304    +1397     
  Branches      32463    32714     +251     
============================================
+ Hits         133923   135000    +1077     
- Misses        63221    63450     +229     
- Partials      10763    10854      +91

Flag	Coverage Δ
custom-integration1	`100.00% <ø> (ø)`
integration	`100.00% <ø> (ø)`
integration1	`100.00% <ø> (ø)`
integration2	`0.00% <ø> (?)`
java-21	`64.49% <69.13%> (+0.08%)`	⬆️
temurin	`64.49% <69.13%> (+0.08%)`	⬆️
unittests	`64.49% <69.13%> (+0.08%)`	⬆️
unittests1	`56.66% <14.28%> (-0.14%)`	⬇️
unittests2	`37.23% <64.03%> (+0.07%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…x (PR 2/4) Storage layer for the OPEN_STRUCT column type: a self-describing column whose keys are discovered at ingest time and stored columnar in two tiers. Mutable (consuming) path: - MutableOpenStructIndex / MutableKeyColumn: per-key dictionary-encoded forward index + presence bitmap, with 3-level type resolution (declared child FieldSpec, else value-based inference) and maxDenseKeys capping. - MutableOpenStructDataSource exposes per-key DataSources to the query layer via the OpenStructDataSource SPI; implemented as a MutableIndex. Seal / offline path: - OpenStructColumnSplitter classifies keys into dense vs sparse by fill rate (plus explicit denseKeys and maxDenseKeys), writes each dense key as a standard materialized column (col$key) with optional dictionary/inverted/ null-vector, and packs the rest into a single sparse JSON column (col$__sparse__). Emits per-child and parent column metadata. - BaseSegmentCreator merges the splitter's per-child metadata into the segment properties so each child loads as its own column. Immutable (sealed) path: - ImmutableSegmentImpl groups materialized children (parentColumn metadata) under an ImmutableOpenStructDataSource in a single pass over column metadata. - ImmutableSegmentLoader keeps materialized children that are absent from the user-facing schema. Wiring / validation: - OpenStructIndexType + OpenStructIndexPlugin register the index; reader factory is a no-op (children load as standard columns). - TableConfigUtils rejects user columns containing the reserved '$' separator when any field is OPEN_STRUCT. - RealtimeSegmentStatsContainer returns EmptyColumnStatistics for the parent. - V1Constants: PARENT_COLUMN, HAS_SPARSE_COLUMN; OpenStructNaming helpers including shared value->DataType inference. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

A BIG_DECIMAL-valued OPEN_STRUCT key aborted segment seal: BIG_DECIMAL is its own stored type (unlike BOOLEAN->INT, TIMESTAMP->LONG), so the splitter's type switches hit `default: throw`. It is reachable via inferDataType for any java.math.BigDecimal or an explicit child FieldSpec. Add BIG_DECIMAL as a variable-length stored type alongside STRING/BYTES in OpenStructColumnSplitter: getDefaultValue (BigDecimal.ZERO), dictionary build, raw var-byte forward index (putBigDecimal), and dictionary-vs-raw sizing. Dictionary dedup uses BigDecimal.equals (matching SegmentDictionaryCreator's equals-keyed indexOfSV), not compareTo, so scale-differing equal values (1.0 vs 1.00) stay distinct instead of silently resolving to dict id 0. The realtime mutable path and segment min/max metadata already handle BIG_DECIMAL and are unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…tisticsCollector Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… stats collector Treats absent docs as null docs holding the default value (standard model). For keys with absent docs, CARDINALITY now counts the default and MIN/MAX include it. Intended. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…MetadataInfo Replaces the hand-written emitVirtualColumnMetadata with the standard metadata writer plus the OPEN_STRUCT-specific keys (PARENT_COLUMN, hasNullValue, hasInvertedIndex). The shared writer additionally emits standard keys the old code omitted, producing a superset of the previous metadata. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…istinctValuesPerKey Replaces the _distinctValuesPerKey distinct-string-set tracking with the sealed stats collector's cardinality and longest-element length. _totalRawBytesPerKey is retained for the var-length raw-size estimate. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

dictElementSize is already 0 on the raw path (only assigned when a dictionary creator exists), so the conditional alias was a no-op. Pass it directly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…reators Drives each dense materialized child's forward, dictionary-id forward, and inverted index through the standard ForwardIndexCreator / DictionaryBasedInvertedIndexCreator obtained from StandardIndexes, using an IndexCreationContext built from the sealed stats collector (no TableConfig). The dict-vs-raw decision now mirrors BaseSegmentCreator.createDictionaryForColumn (standard default flags), and absent docs store the Pinot dimension null default. Deletes writeRawForwardIndex, shouldUseDictionary, getDefaultValue, and _totalRawBytesPerKey. Behavior changes (intended, unreleased feature): absent-doc defaults are now dimension nulls (STRING "null", INT MIN_VALUE, ...); the local size-ratio dict downgrade is removed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…riteDenseKeyColumn Split the ~144-line writeDenseKeyColumn into a focused orchestrator plus two private helpers, mirroring how BaseSegmentCreator decomposes column creation: - resolveUseDictionary(...): the three-step dict-vs-raw decision. - writeForwardAndInvertedIndexes(...): dictionary + forward + inverted creation, the per-doc add loop, and the nested resource handling; returns the dictionary element size for metadata. Behavior-preserving — no logic, ordering, or resource-management change. Covered by the existing OpenStructColumnSplitterTest (16 tests). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…o OpenStructTypeInference inferDataType performs value->DataType inference, an orthogonal concern to OpenStructNaming's column-name string mapping (it uses none of the naming constants). Relocate it to a dedicated OpenStructTypeInference class in the same package so each class has a single responsibility; update the two callers. Behavior-preserving — the method body is moved verbatim. The core Object classification still delegates to PinotDataType.getSingleValueType; the remaining PinotDataType->DataType switch is OPEN_STRUCT-specific policy that exists nowhere else. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…taType Data-driven test locking the value->DataType mapping policy now that it lives in its own class: each return branch (INT via all four widening inputs, LONG, FLOAT, DOUBLE, BIG_DECIMAL, BOOLEAN, TIMESTAMP, STRING via all four folding inputs, BYTES) plus the unrepresentable cases (Map, bare Object) returning null. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ldConfig, no TableConfig) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…at table-config time Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… (inverted/range/bloom) Drive per-key index creation from IndexService#getAllIndexes (filtered to the OPEN_STRUCT allowlist), building the dictionary separately and reconciling forward/dictionary encoding with the dictionary decision. Each vetted index type's validate() runs against the resolved child FieldSpec at build time so misconfigurations (e.g. range on a non-numeric raw key) fail with the canonical message instead of crashing inside the creator. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…exCreator failures dictCreator.seal() was positioned after the inner try/finally for index creators. If any IndexCreator.seal() threw, the dictionary file was left unsealed on disk. Move seal into the inner finally so it runs regardless of index creator success/failure. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…build pipeline The OPEN_STRUCT storage/creator layer was implemented but the offline batch ingestion pipeline (SegmentIndexCreationDriverImpl) was never made OPEN_STRUCT-aware, causing segment builds to throw. Four gaps fixed: - PinotDataType.getPinotDataTypeForIngestion: add OPEN_STRUCT case returning MAP so the Map value passes through the transform pipeline unchanged (unblocks both offline and realtime ingestion). - StatsCollectorUtil.createStatsCollector: add OPEN_STRUCT case using MapColumnPreIndexStatsCollector and exclude from NoDictCollector opt. - ForwardIndexType.shouldCreateIndex: return false for OPEN_STRUCT parent — it has no forward index (mirroring MutableSegmentImpl). - BaseSegmentCreator.writeMetadata: register splitter child columns in the DIMENSIONS property so V3 converter discovers their index files. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fix three gaps in the realtime OPEN_STRUCT consume→seal path: 1. MutableOpenStructIndex: remove consume-time dense-key cap that incorrectly dropped keys by arrival order. All observed keys are now retained; dense/sparse classification deferred to seal time via OpenStructColumnSplitter.classify(). 2. PinotSegmentRecordReader: OPEN_STRUCT parent columns have no forward index, so PinotSegmentColumnReader crashes on them. Track OpenStructDataSource columns separately and reconstruct per-doc map values via getMapValue(docId) for the row-major seal path. 3. SegmentColumnarIndexCreator: the column-major build path (used by RealtimeSegmentConverter when columnMajorSegmentBuilder is enabled) also tried to create PinotSegmentColumnReader for OPEN_STRUCT parents. Read from OpenStructDataSource.getMapValue(docId) instead, feeding the map values to the OpenStructColumnSplitter. Add OpenStructDataSource.getMapValue() as a default SPI method (throws UnsupportedOperationException), implemented by MutableOpenStructDataSource which reconstructs the map from per-key MutableKeyColumns. Add realtime e2e integration test (OpenStructIngestionCommitRealtimeTest) that validates: Kafka consume, forceCommit, COUNT(*), dense/sparse child column presence and types, per-key forward/dictionary index_map, and parent/sparse metadata properties. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…eProcessor The SegmentPreProcessor's InvertedIndexHandler was stripping inverted indexes from OPEN_STRUCT child columns because IndexLoadingConfig only knew about schema-level columns. Add addOpenStructChildConfigs() to resolve per-key FieldConfig from the parent's OpenStructIndexConfig and inject it into the config map before handlers run. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Exercises the row-major offline build path (PinotSegmentRecordReader → SegmentColumnarIndexCreator) with the same per-key index matrix and dense/sparse validation as the realtime variant. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

tarun11Mavani · 2026-06-04T10:35:52Z

@xiangfu0 @raghavyadav01 please take a look.

tarun11Mavani force-pushed the open-struct-storage branch from 7f254e4 to d0e436f Compare June 1, 2026 05:26

tarun11Mavani force-pushed the open-struct-storage branch from c6a3db5 to 72cf061 Compare June 1, 2026 08:29

tarun11Mavani and others added 19 commits June 1, 2026 09:32

refactor(stats): add FieldSpec-based constructor to AbstractColumnSta…

8b4c0df

…tisticsCollector Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

refactor(stats): add FieldSpec-based collector factory overload

8d438bf

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

refactor(open_struct): drop redundant dictionaryElementSize alias

d8bd6d5

dictElementSize is already 0 on the raw path (only assigned when a dictionary creator exists), so the conditional alias was a no-op. Pass it directly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(open_struct): add FieldIndexConfigsUtil.fromFieldConfig (per-Fie…

c895874

…ldConfig, no TableConfig) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(open_struct): validate per-key indexes against vetted allowlist …

2a37cf9

…at table-config time Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

tarun11Mavani marked this pull request as ready for review June 4, 2026 08:27

style(open_struct): convert Javadoc to /// markdown style

6ccb988

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

tarun11Mavani changed the title ~~[WIP] OPEN_STRUCT storage layer — columnar two-tier dense/sparse index (PR 2/4)~~ OPEN_STRUCT storage layer — columnar two-tier dense/sparse index (PR 2/4) Jun 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OPEN_STRUCT storage layer — columnar two-tier dense/sparse index (PR 2/4)#18643

OPEN_STRUCT storage layer — columnar two-tier dense/sparse index (PR 2/4)#18643
tarun11Mavani wants to merge 21 commits into
apache:masterfrom
tarun11Mavani:open-struct-storage

tarun11Mavani commented Jun 1, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented Jun 1, 2026 •

edited

Loading

Uh oh!

tarun11Mavani commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tarun11Mavani commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Architecture

Mutable (consuming) path

Seal / offline path

Immutable (sealed) path

Ingestion pipeline wiring

Wiring / validation

pinot-spi changes

Backward compatibility

Follow-up PRs

Test plan

Uh oh!

codecov-commenter commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

tarun11Mavani commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tarun11Mavani commented Jun 1, 2026 •

edited

Loading

`pinot-spi` changes

codecov-commenter commented Jun 1, 2026 •

edited

Loading