Skip to content

Optimize JSON index doc id mapping#18680

Open
xiangfu0 wants to merge 1 commit into
apache:masterfrom
xiangfu0:json-index-docid-fastpath
Open

Optimize JSON index doc id mapping#18680
xiangfu0 wants to merge 1 commit into
apache:masterfrom
xiangfu0:json-index-docid-fastpath

Conversation

@xiangfu0
Copy link
Copy Markdown
Contributor

@xiangfu0 xiangfu0 commented Jun 4, 2026

Summary

  • skip flattened-doc-id to segment-doc-id translation when the JSON index mapping is identity
  • keep a direct doc-id bitmap path for realtime JSON index predicates whose JSON paths cannot expand through arrays
  • fall back to flattened-doc evaluation for array paths to preserve same-array-element semantics

User Manual

No table config changes are required. Existing JSON index configurations continue to work. Queries using JSON_MATCH or jsonExtractIndex can benefit automatically when a segment has one flattened JSON record per Pinot document, or in realtime segments when scalar/object JSON paths can be evaluated without array-element correlation.

Sample table config snippet:

{
  "tableIndexConfig": {
    "jsonIndexConfigs": {
      "payload": {}
    }
  }
}

Sample queries:

SELECT COUNT(*)
FROM myTable
WHERE JSON_MATCH(payload, '"$.eventType" = ''click''');

SELECT jsonExtractIndex(payload, '$.eventType', 'STRING')
FROM myTable
WHERE JSON_MATCH(payload, '"$.country" = ''US''');

Array predicates still use flattened-doc semantics:

SELECT COUNT(*)
FROM myTable
WHERE JSON_MATCH(payload, '"$.items[*].sku" = ''abc'' AND "$.items[*].qty" > 1');

Tests

  • ./mvnw -pl pinot-segment-local -Dtest=JsonIndexTest test
  • ./mvnw spotless:apply -pl pinot-segment-local
  • ./mvnw checkstyle:check -pl pinot-segment-local
  • ./mvnw license:format -pl pinot-segment-local
  • ./mvnw license:check -pl pinot-segment-local
  • git diff --check

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jun 4, 2026

Codecov Report

❌ Patch coverage is 72.34927% with 133 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.48%. Comparing base (dd6520c) to head (8c4df24).
⚠️ Report is 16 commits behind head on master.

Files with missing lines Patch % Lines
...t/index/readers/json/ImmutableJsonIndexReader.java 65.09% 46 Missing and 28 partials ⚠️
...local/realtime/impl/json/MutableJsonIndexImpl.java 69.63% 41 Missing and 17 partials ⚠️
...nt/creator/impl/inv/json/BaseJsonIndexCreator.java 98.71% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18680      +/-   ##
============================================
+ Coverage     64.47%   64.48%   +0.01%     
  Complexity     1291     1291              
============================================
  Files          3371     3372       +1     
  Lines        208551   209047     +496     
  Branches      32569    32705     +136     
============================================
+ Hits         134455   134804     +349     
- Misses        63292    63384      +92     
- Partials      10804    10859      +55     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-21 64.48% <72.34%> (+0.01%) ⬆️
temurin 64.48% <72.34%> (+0.01%) ⬆️
unittests 64.48% <72.34%> (+0.01%) ⬆️
unittests1 56.84% <41.16%> (-0.06%) ⬇️
unittests2 37.17% <71.30%> (+0.08%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes JSON index doc-id handling by avoiding flattened-doc-id → segment-doc-id translation when it’s unnecessary and by introducing a direct doc-id evaluation path for eligible realtime JSON predicates, while preserving correct semantics for array paths.

Changes:

  • Detect and fast-path identity doc-id mappings in ImmutableJsonIndexReader to avoid per-match doc-id translation.
  • Add a direct doc-id posting-list map and corresponding predicate evaluation path to MutableJsonIndexImpl for eligible (non-array) JSON paths.
  • Keep fallback behavior for array-path predicates to preserve same-array-element correlation semantics.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/readers/json/ImmutableJsonIndexReader.java Adds identity mapping detection and a direct-doc-id evaluation path when flattened-doc IDs match segment doc IDs.
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/impl/json/MutableJsonIndexImpl.java Adds a doc-id posting list map and predicate evaluation path to bypass flattened-doc mapping for eligible realtime predicates.

Comment on lines +174 to +180
private boolean isDocIdMappingIdentity() {
if (_numFlattenedDocs != _numDocs) {
return false;
}
for (int docId = 0; docId < _numDocs; docId++) {
if (getDocId(docId) != docId) {
return false;
Comment on lines +186 to +189
private MutableRoaringBitmap toDocIdBitmap(ImmutableRoaringBitmap flattenedDocIds) {
if (_docIdMappingIdentity) {
return flattenedDocIds.toMutableRoaringBitmap();
}
Comment on lines +168 to +172
if (!_docIdMappingIdentity && directDocIdValues != null) {
for (String value : directDocIdValues) {
_docIdPostingListMap.computeIfAbsent(value, k -> new MutableRoaringBitmap()).add(_nextDocId);
}
}
Comment on lines 824 to 828
public void convertFlattenedDocIdsToDocIds(Map<String, RoaringBitmap> valueToFlattenedDocIds) {
if (_docIdMappingIdentity) {
return;
}
_readLock.lock();
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants