Bring CalciteWhereCommandIT to parity on the analytics-engine route#5546
Open
ahkcs wants to merge 1 commit into
Open
Bring CalciteWhereCommandIT to parity on the analytics-engine route#5546ahkcs wants to merge 1 commit into
ahkcs wants to merge 1 commit into
Conversation
Branch the tests that exercise analytics-route-incompatible behavior behind isAnalyticsParquetIndicesEnabled(): - 6 nested-field tests (in the Calcite subclass): nested fields are stripped on the parquet/composite route, which has no nested-document support. - 2 _id metadata-field tests: the analytics route doesn't expose the _id metadata field. - testDoubleEqualWithSpecialCharacters: email is dynamically mapped to a text field without a .keyword sub-field on the parquet route (the composite dataformat's dynamic mapping omits the keyword sub-field standard OpenSearch adds), so exact equality on the analyzed text field returns no rows on the DataFusion scan. firstname (explicit text+keyword) still exercises == there. Analytics-engine route: 9 failing -> 0 (32 pass, 9 documented skips). v2 route: 41/41 pass. Signed-off-by: Kai Huang <ahkcs@amazon.com>
dai-chen
reviewed
Jun 12, 2026
|
|
||
| @Test | ||
| public void testFilterOnComputedNestedFields() throws IOException { | ||
| assumeFalse(NESTED_UNSUPPORTED_ON_AE, isAnalyticsParquetIndicesEnabled()); |
Collaborator
There was a problem hiding this comment.
I thought we'd like to maintain the exclude list in gradle. Or any other benefits with this?
Swiddis
reviewed
Jun 12, 2026
|
|
||
| public class CalciteWhereCommandIT extends WhereCommandIT { | ||
|
|
||
| private static final String NESTED_UNSUPPORTED_ON_AE = |
Collaborator
There was a problem hiding this comment.
can this be public since it's relevant to other tests too
Swiddis
reviewed
Jun 12, 2026
| @Test | ||
| public void testWhereWithMetadataFields() throws IOException { | ||
| assumeFalse( | ||
| "The analytics-engine route doesn't expose the _id metadata field (parquet-backed scans" |
Collaborator
There was a problem hiding this comment.
For all the AE assumptions, if we're sticking with this (instead of chen's gradle exclude), we should maybe put this in one global enum so it's all trackable at once, without doing a bunch of string copying + searching. We can also just feed agents that enum as context for bulk-debugging.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Brings
CalciteWhereCommandITto parity on the analytics-engine route (:integ-test:integTestRemote -Dtests.analytics.parquet_indices=true): 9 failing → 0, by branching the tests that exercise behavior the route inherently can't support behindisAnalyticsParquetIndicesEnabled(). All branches are no-ops off the analytics route, so the v2 path is unchanged.Pass rate
:integ-test:integTest, fresh cluster)Documented skips (inherent analytics-route divergences)
testFilterOnComputedNestedFields,testFilterOnNestedAndRootFields,testFilterOnNestedFields,testFilterOnMultipleCascadedNestedFields,testScriptFilterOnDifferentNestedHierarchyShouldThrow,testAggFilterOnNestedFieldsaddress.city,projects.name,author.books.*). The parquet/composite store has no nested-document support, so nested fields are stripped from the dataset at load (#5541) and the fields don't exist on this route.testWhereWithMetadataFields,testWhereWithMetadataFields2where _id='1'. The analytics-engine route doesn't expose the_idmetadata field — parquet-backed scans surface only mapped document fields.testDoubleEqualWithSpecialCharactersemailis dynamically mapped to atextfield without a.keywordsub-field on the parquet route (the composite dataformat's dynamic mapping omits the keyword sub-field that standard OpenSearch adds). Exact equality on an analyzed text field then returns no rows on the DataFusion scan.firstname(explicittext+keyword) still exercises==on the analytics route.Note for reviewers
The
testDoubleEqualWithSpecialCharactersroot cause is broader than this one test: any exact-match (=/==) on a dynamically-mapped string field silently returns 0 rows on the analytics-engine route, because dynamic strings get no.keywordsub-field there. Worth tracking as a potential analytics-engine fix (have dynamic string mapping add the keyword sub-field, or make text-field equality fall back to the stored value) rather than relying on every fixture to map string fields explicitly.Check List
--signoff.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.