Optimize JSON index doc id mapping#18680
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #18680 +/- ##
============================================
+ Coverage 64.47% 64.48% +0.01%
Complexity 1291 1291
============================================
Files 3371 3372 +1
Lines 208551 209047 +496
Branches 32569 32705 +136
============================================
+ Hits 134455 134804 +349
- Misses 63292 63384 +92
- Partials 10804 10859 +55
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR optimizes JSON index doc-id handling by avoiding flattened-doc-id → segment-doc-id translation when it’s unnecessary and by introducing a direct doc-id evaluation path for eligible realtime JSON predicates, while preserving correct semantics for array paths.
Changes:
- Detect and fast-path identity doc-id mappings in
ImmutableJsonIndexReaderto avoid per-match doc-id translation. - Add a direct doc-id posting-list map and corresponding predicate evaluation path to
MutableJsonIndexImplfor eligible (non-array) JSON paths. - Keep fallback behavior for array-path predicates to preserve same-array-element correlation semantics.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/readers/json/ImmutableJsonIndexReader.java | Adds identity mapping detection and a direct-doc-id evaluation path when flattened-doc IDs match segment doc IDs. |
| pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/impl/json/MutableJsonIndexImpl.java | Adds a doc-id posting list map and predicate evaluation path to bypass flattened-doc mapping for eligible realtime predicates. |
| private boolean isDocIdMappingIdentity() { | ||
| if (_numFlattenedDocs != _numDocs) { | ||
| return false; | ||
| } | ||
| for (int docId = 0; docId < _numDocs; docId++) { | ||
| if (getDocId(docId) != docId) { | ||
| return false; |
| private MutableRoaringBitmap toDocIdBitmap(ImmutableRoaringBitmap flattenedDocIds) { | ||
| if (_docIdMappingIdentity) { | ||
| return flattenedDocIds.toMutableRoaringBitmap(); | ||
| } |
| if (!_docIdMappingIdentity && directDocIdValues != null) { | ||
| for (String value : directDocIdValues) { | ||
| _docIdPostingListMap.computeIfAbsent(value, k -> new MutableRoaringBitmap()).add(_nextDocId); | ||
| } | ||
| } |
| public void convertFlattenedDocIdsToDocIds(Map<String, RoaringBitmap> valueToFlattenedDocIds) { | ||
| if (_docIdMappingIdentity) { | ||
| return; | ||
| } | ||
| _readLock.lock(); |
733888f to
8c4df24
Compare
Summary
User Manual
No table config changes are required. Existing JSON index configurations continue to work. Queries using JSON_MATCH or jsonExtractIndex can benefit automatically when a segment has one flattened JSON record per Pinot document, or in realtime segments when scalar/object JSON paths can be evaluated without array-element correlation.
Sample table config snippet:
{ "tableIndexConfig": { "jsonIndexConfigs": { "payload": {} } } }Sample queries:
Array predicates still use flattened-doc semantics:
Tests