Skip to content

Decouple PPL logical plans from UDT temporal types#5547

Draft
penghuo wants to merge 1 commit into
opensearch-project:mainfrom
penghuo:feat/expr_lazy_udt_v1
Draft

Decouple PPL logical plans from UDT temporal types#5547
penghuo wants to merge 1 commit into
opensearch-project:mainfrom
penghuo:feat/expr_lazy_udt_v1

Conversation

@penghuo

@penghuo penghuo commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Description

This PR makes two related but distinct architectural changes to the Calcite-side PPL planner. Together they let logical RelNode/RexNode trees produced during analysis use ordinary Calcite types end-to-end. UDTs (and ExprType) only re-enter the picture immediately before physical execution, where they're needed to keep Linq4j codegen seeing the existing VARCHAR-backed runtime representation.


1. Decouple ExprType from the logical plan

Calcite-side PPL planning code (RelNode/RexNode construction, coercion, type checking, UDF return-type inference, UDF implementors) now operates on RelDataType directly instead of converting to/from ExprType at every step.

  • Coercion (CoercionUtils): new RelDataType-typed common-type resolver with an internal widening DAG keyed by a small CoercionTag enum (matches v2 widening semantics — STRING → TIMESTAMP direct edge, UNDEFINED → any-concrete unit cost, etc).
  • Type checking (PPLTypeChecker): signatures expressed as List<List<RelDataType>>; new renderTypeName for error messages; new temporalKind helper that canonicalizes DATE/TIME/TIMESTAMP across UDT and standard SQL temporal forms in a single place.
  • Operand metadata (PPLOperandTypes, UDFOperandMetadata): signature constants exposed as RelDataType (e.g. INTEGER_T, STRING_T, IP_UDT).
  • UDF implementors (AddSubDate, Extract, Format, LastDay, PeriodName, TimestampAdd, TimestampDiff, Weekday, Span, WidthBucket, IPFunction, CidrMatchFunction, CompareIpFunction, geo-IP, CurrentFunction): branch on RelDataType (UDT class instances or SqlTypeName) and pass RelDataType through their lowering, instead of bouncing through ExprType.
  • CalciteRexNodeVisitor.visitCast: maps AST DataType directly to RelDataType, removing the DataType.getCoreType() round-trip.
  • ExtendedRexBuilder: type discrimination via instanceof UDT classes and SqlTypeName, no ExprType conversion.

UDT identity is preserved at planning time via instanceof checks (not getExprType()); subclasses survive createTypeWithNullability / createTypeWithCharsetAndCollation via a cloneWith hook on ExprSqlType/ExprJavaType.

2. Remove DATE/TIME/TIMESTAMP UDTs from the logical plan

After the ExprType decoupling lands, the only thing pinning UDT temporal types to the logical layer was convertExprTypeToRelDataType. This change pushes those UDTs out of the logical plan entirely:

  • Type factory: OpenSearchTypeFactory.convertExprTypeToRelDataType returns standard DATE/TIME(9)/TIMESTAMP(9) for the corresponding ExprCoreType. New isStandardTemporalType helper. IP and BINARY remain UDT.
  • Cast emission (CalciteRexNodeVisitor.visitCast): emits standard temporal types for AST DATE/TIME/TIMESTAMP cast targets.
  • Cast lowering (ExtendedRexBuilder.makeCast): standard temporal targets dispatch to PPLBuiltinOperators.DATE/TIME/TIMESTAMP UDFs but keep the call's RelDataType standard. IP routed separately.
  • Pushdown (PredicateAnalyzer): isTimestamp/isDate accept both standard SqlTypeName and UDT, and read literal values via getValueAs(String.class) so TimestampString/DateString round-trip without ClassCastException.
  • Constant rename: NULLABLE_*_UDTNULLABLE_*_T, DATE_UDT/TIME_UDT/TIMESTAMP_UDTDATE_T/TIME_T/TIMESTAMP_T.
  • DatetimeExtension / DatetimeUdtNormalizeRule removed (introduced in Normalize datetime types for unified query API #5408). Their sole purpose was to rewrite UDT temporal return types to standard Calcite types as a unified-API post-analysis pass; the analysis pipeline no longer produces temporal UDTs in the first place, so the rule is a no-op and the extension is removed along with its registrations in UnifiedPplSpec/UnifiedSqlSpec.
TemporalUdtRewriteShuttle (the conversion boundary)

A new RelShuttle runs once at the prepare-statement boundary (OpenSearchCalcitePreparingStmt.implement and OpenSearchRelRunners.run) and rewrites every standard temporal type in the tree back to its UDT counterpart. Implementation notes:

  • Atomic rebuild for stateful nodes. Calcite's default RelShuttleImpl.visitChild rebuilds parents via parent.copy(traits, newInputs) after visiting each child, which makes both rewrite-then-visit and visit-then-rewrite orderings briefly violate row-type consistency. The shuttle intercepts Project/Filter/Calc/Aggregate/Values and rebuilds them in one shot with rewritten inputs and rewritten RexNodes together.
  • Filter rebuild preserves variablesSet. Without this, correlated subqueries lose their CorrelationId binding.
  • RexShuttle extensions: visitSubQuery recurses into the inner plan (Calcite's default doesn't); visitLambda/visitLambdaRef rebuild the lambda body and parameter refs together so transform(arr, x -> ...) over temporal-element arrays stays consistent.
  • rewriteType recurses into ARRAY/MULTISET/MAP/struct, so collection-typed temporal columns (e.g. TAKE(@timestamp, 2)) get rewritten.
  • TemporalSchemaRewritable marker interface. AbstractCalciteIndexScan implements it and rewrites its schema in place via buildScan(...), rather than wrapping in a LogicalProject(CAST(...)). This is what keeps Linq4j codegen reading String values consistent with what OpenSearchExprValueFactory delivers at runtime.
Other adjustments
  • Calcite IT golden files (explain_appendpipe_command.json × 2 profiles): logical plans now show TIMESTAMP(9) instead of EXPR_TIMESTAMP VARCHAR; physical plans still show UDT post-shuttle.

Tests

  • New unit tests: TemporalUdtRewriteShuttleTest, CalcitePPLLogicalPlanStandardTemporalTest (proves no temporal UDT appears anywhere in the logical RelNode / RexNode tree for representative queries — table scan, eval-cast, filter, group-by, UDF call), expanded CoercionUtilsTest, expanded OpenSearchTypeFactoryTest.
  • All module unit tests pass.
  • Full Calcite IT suite (*Calcite*IT) passes: 4968 tests, 0 failures, 0 errors.

Related Issues

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using `--signoff` or `-s`.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@penghuo penghuo added the PPL Piped processing language label Jun 12, 2026
@penghuo penghuo self-assigned this Jun 12, 2026
Logical RelNode/RexNode trees now use standard Calcite DATE/TIME(9)/
TIMESTAMP(9) types for date columns. A new TemporalUdtRewriteShuttle
runs at the prepare-statement boundary (OpenSearchCalcitePreparingStmt
.implement and OpenSearchRelRunners.run) and rewrites the standard
temporal types back to UDTs (ExprDateType/ExprTimeType/ExprTimeStampType)
just before physical execution, so Linq4j keeps receiving the
VARCHAR-backed representation.

This commit folds in the prior "Decouple Calcite PPL planning from
ExprType" change as well: Calcite-side PPL planning (RelNode/RexNode
code, coercion, type checking, and UDF implementations) operates on
RelDataType throughout instead of bouncing through ExprType.

Type system:
- OpenSearchTypeFactory.convertExprTypeToRelDataType returns standard
  TIMESTAMP(9)/DATE/TIME(9) for the corresponding ExprCoreType. IP and
  BINARY remain UDT.
- New helper isStandardTemporalType.
- PPLOperandTypes constants renamed DATE_UDT/TIME_UDT/TIMESTAMP_UDT to
  DATE_T/TIME_T/TIMESTAMP_T.
- UserDefinedFunctionUtils NULLABLE_*_UDT renamed to NULLABLE_*_T and
  repointed to standard temporal RelDataTypes (IP and BINARY constants
  unchanged).
- UDTs are recognised via instanceof rather than getExprType(); cloneWith
  preserves UDT identity through createTypeWithNullability.

Cast emission and pushdown:
- CalciteRexNodeVisitor.visitCast emits standard temporal types for
  AST DATE/TIME/TIMESTAMP cast targets.
- ExtendedRexBuilder.makeCast routes IP separately; standard temporal
  targets dispatch to PPLBuiltinOperators.DATE/TIME/TIMESTAMP with the
  standard target type so the call's RelDataType stays standard.
- PredicateAnalyzer.isTimestamp/isDate accept both standard SqlTypeName
  and UDT, and read literal values via getValueAs(String.class) so
  TimestampString/DateString round-trip without ClassCastException.

Coercion + type-checker:
- RelDataType-typed common-type resolver with a CoercionTag widening DAG.
- CoercionUtils' DATE+TIME -> TIMESTAMP resolver emits standard
  TIMESTAMP(9).
- PPLTypeChecker gains a temporalKind helper used in typesMatch and
  isComparable so standard and UDT temporal pairs match across forms.
- getRelDataTypes(family) returns standard temporal types for
  DATETIME/TIMESTAMP/DATE/TIME families (BINARY family stays UDT).
- UDF return-type inference (AddSubDate/Weekday/LastDay/TimestampDiff
  /Format/TimestampAdd/Extract/Span/WidthBucket) recognise both
  standard and UDT temporal operand types.

Shuttle implementation:
- Atomic rebuild path for Project/Filter/Calc/Aggregate/Values nodes
  (Calcite's default copy-then-validate path doesn't work because each
  half of the rewrite would briefly violate row-type consistency).
- Filter rebuild preserves variablesSet so correlated subqueries keep
  their CorrelationId binding.
- TemporalSchemaRewritable marker interface lets OpenSearch table
  scans rewrite their schema in place rather than wrap in a
  LogicalProject(CAST), so Calcite Linq4j codegen reads String values
  matching what OpenSearchExprValueFactory delivers at runtime.

Other:
- DatetimeUdtNormalizeRule (api/datetime extension) recognises both
  UDT and standard temporal RexCalls and forces precision to MAX so
  unified-API consumers see TIMESTAMP(9).
- Calcite IT golden files updated to reflect logical plans now showing
  TIMESTAMP(9) instead of EXPR_TIMESTAMP VARCHAR (physical plans still
  show UDT post-shuttle).

Tests:
- New unit: TemporalUdtRewriteShuttleTest, CalcitePPLLogicalPlanStandardTemporalTest.
- Updated CoercionUtilsTest, OpenSearchTypeFactoryTest. All module unit
  tests pass. Full Calcite IT suite (including ExplainIT) green.

Signed-off-by: Peng Huo <penghuo@gmail.com>
@penghuo penghuo force-pushed the feat/expr_lazy_udt_v1 branch from d17aa3b to 3244bed Compare June 12, 2026 21:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

PPL Piped processing language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant