Skip to content

Bring CalciteWhereCommandIT to parity on the analytics-engine route#5546

Open
ahkcs wants to merge 1 commit into
opensearch-project:mainfrom
ahkcs:analytics/where-it-parity
Open

Bring CalciteWhereCommandIT to parity on the analytics-engine route#5546
ahkcs wants to merge 1 commit into
opensearch-project:mainfrom
ahkcs:analytics/where-it-parity

Conversation

@ahkcs

@ahkcs ahkcs commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Description

Brings CalciteWhereCommandIT to parity on the analytics-engine route (:integ-test:integTestRemote -Dtests.analytics.parquet_indices=true): 9 failing → 0, by branching the tests that exercise behavior the route inherently can't support behind isAnalyticsParquetIndicesEnabled(). All branches are no-ops off the analytics route, so the v2 path is unchanged.

Pass rate

Route Before After
analytics-engine 9 failing / 41 0 failing — 32 pass, 9 documented skips
v2 (:integ-test:integTest, fresh cluster) 41 pass 41 pass, 0 skip

Documented skips (inherent analytics-route divergences)

Test(s) Reason
testFilterOnComputedNestedFields, testFilterOnNestedAndRootFields, testFilterOnNestedFields, testFilterOnMultipleCascadedNestedFields, testScriptFilterOnDifferentNestedHierarchyShouldThrow, testAggFilterOnNestedFields These query nested fields (address.city, projects.name, author.books.*). The parquet/composite store has no nested-document support, so nested fields are stripped from the dataset at load (#5541) and the fields don't exist on this route.
testWhereWithMetadataFields, testWhereWithMetadataFields2 Use where _id='1'. The analytics-engine route doesn't expose the _id metadata field — parquet-backed scans surface only mapped document fields.
testDoubleEqualWithSpecialCharacters email is dynamically mapped to a text field without a .keyword sub-field on the parquet route (the composite dataformat's dynamic mapping omits the keyword sub-field that standard OpenSearch adds). Exact equality on an analyzed text field then returns no rows on the DataFusion scan. firstname (explicit text+keyword) still exercises == on the analytics route.

Note for reviewers

The testDoubleEqualWithSpecialCharacters root cause is broader than this one test: any exact-match (= / ==) on a dynamically-mapped string field silently returns 0 rows on the analytics-engine route, because dynamic strings get no .keyword sub-field there. Worth tracking as a potential analytics-engine fix (have dynamic string mapping add the keyword sub-field, or make text-field equality fall back to the stored value) rather than relying on every fixture to map string fields explicitly.

Check List

  • New functionality includes testing.
  • Commits are signed per the DCO using --signoff.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Branch the tests that exercise analytics-route-incompatible behavior behind
isAnalyticsParquetIndicesEnabled():
- 6 nested-field tests (in the Calcite subclass): nested fields are stripped on
  the parquet/composite route, which has no nested-document support.
- 2 _id metadata-field tests: the analytics route doesn't expose the _id
  metadata field.
- testDoubleEqualWithSpecialCharacters: email is dynamically mapped to a text
  field without a .keyword sub-field on the parquet route (the composite
  dataformat's dynamic mapping omits the keyword sub-field standard OpenSearch
  adds), so exact equality on the analyzed text field returns no rows on the
  DataFusion scan. firstname (explicit text+keyword) still exercises == there.

Analytics-engine route: 9 failing -> 0 (32 pass, 9 documented skips).
v2 route: 41/41 pass.

Signed-off-by: Kai Huang <ahkcs@amazon.com>

@Test
public void testFilterOnComputedNestedFields() throws IOException {
assumeFalse(NESTED_UNSUPPORTED_ON_AE, isAnalyticsParquetIndicesEnabled());

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we'd like to maintain the exclude list in gradle. Or any other benefits with this?


public class CalciteWhereCommandIT extends WhereCommandIT {

private static final String NESTED_UNSUPPORTED_ON_AE =

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be public since it's relevant to other tests too

@Test
public void testWhereWithMetadataFields() throws IOException {
assumeFalse(
"The analytics-engine route doesn't expose the _id metadata field (parquet-backed scans"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all the AE assumptions, if we're sticking with this (instead of chen's gradle exclude), we should maybe put this in one global enum so it's all trackable at once, without doing a bunch of string copying + searching. We can also just feed agents that enum as context for bulk-debugging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants