Decouple PPL logical plans from UDT temporal types#5547
Draft
penghuo wants to merge 1 commit into
Draft
Conversation
Logical RelNode/RexNode trees now use standard Calcite DATE/TIME(9)/ TIMESTAMP(9) types for date columns. A new TemporalUdtRewriteShuttle runs at the prepare-statement boundary (OpenSearchCalcitePreparingStmt .implement and OpenSearchRelRunners.run) and rewrites the standard temporal types back to UDTs (ExprDateType/ExprTimeType/ExprTimeStampType) just before physical execution, so Linq4j keeps receiving the VARCHAR-backed representation. This commit folds in the prior "Decouple Calcite PPL planning from ExprType" change as well: Calcite-side PPL planning (RelNode/RexNode code, coercion, type checking, and UDF implementations) operates on RelDataType throughout instead of bouncing through ExprType. Type system: - OpenSearchTypeFactory.convertExprTypeToRelDataType returns standard TIMESTAMP(9)/DATE/TIME(9) for the corresponding ExprCoreType. IP and BINARY remain UDT. - New helper isStandardTemporalType. - PPLOperandTypes constants renamed DATE_UDT/TIME_UDT/TIMESTAMP_UDT to DATE_T/TIME_T/TIMESTAMP_T. - UserDefinedFunctionUtils NULLABLE_*_UDT renamed to NULLABLE_*_T and repointed to standard temporal RelDataTypes (IP and BINARY constants unchanged). - UDTs are recognised via instanceof rather than getExprType(); cloneWith preserves UDT identity through createTypeWithNullability. Cast emission and pushdown: - CalciteRexNodeVisitor.visitCast emits standard temporal types for AST DATE/TIME/TIMESTAMP cast targets. - ExtendedRexBuilder.makeCast routes IP separately; standard temporal targets dispatch to PPLBuiltinOperators.DATE/TIME/TIMESTAMP with the standard target type so the call's RelDataType stays standard. - PredicateAnalyzer.isTimestamp/isDate accept both standard SqlTypeName and UDT, and read literal values via getValueAs(String.class) so TimestampString/DateString round-trip without ClassCastException. Coercion + type-checker: - RelDataType-typed common-type resolver with a CoercionTag widening DAG. - CoercionUtils' DATE+TIME -> TIMESTAMP resolver emits standard TIMESTAMP(9). - PPLTypeChecker gains a temporalKind helper used in typesMatch and isComparable so standard and UDT temporal pairs match across forms. - getRelDataTypes(family) returns standard temporal types for DATETIME/TIMESTAMP/DATE/TIME families (BINARY family stays UDT). - UDF return-type inference (AddSubDate/Weekday/LastDay/TimestampDiff /Format/TimestampAdd/Extract/Span/WidthBucket) recognise both standard and UDT temporal operand types. Shuttle implementation: - Atomic rebuild path for Project/Filter/Calc/Aggregate/Values nodes (Calcite's default copy-then-validate path doesn't work because each half of the rewrite would briefly violate row-type consistency). - Filter rebuild preserves variablesSet so correlated subqueries keep their CorrelationId binding. - TemporalSchemaRewritable marker interface lets OpenSearch table scans rewrite their schema in place rather than wrap in a LogicalProject(CAST), so Calcite Linq4j codegen reads String values matching what OpenSearchExprValueFactory delivers at runtime. Other: - DatetimeUdtNormalizeRule (api/datetime extension) recognises both UDT and standard temporal RexCalls and forces precision to MAX so unified-API consumers see TIMESTAMP(9). - Calcite IT golden files updated to reflect logical plans now showing TIMESTAMP(9) instead of EXPR_TIMESTAMP VARCHAR (physical plans still show UDT post-shuttle). Tests: - New unit: TemporalUdtRewriteShuttleTest, CalcitePPLLogicalPlanStandardTemporalTest. - Updated CoercionUtilsTest, OpenSearchTypeFactoryTest. All module unit tests pass. Full Calcite IT suite (including ExplainIT) green. Signed-off-by: Peng Huo <penghuo@gmail.com>
d17aa3b to
3244bed
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR makes two related but distinct architectural changes to the Calcite-side PPL planner. Together they let logical RelNode/RexNode trees produced during analysis use ordinary Calcite types end-to-end. UDTs (and
ExprType) only re-enter the picture immediately before physical execution, where they're needed to keep Linq4j codegen seeing the existing VARCHAR-backed runtime representation.1. Decouple
ExprTypefrom the logical planCalcite-side PPL planning code (RelNode/RexNode construction, coercion, type checking, UDF return-type inference, UDF implementors) now operates on
RelDataTypedirectly instead of converting to/fromExprTypeat every step.CoercionUtils): newRelDataType-typed common-type resolver with an internal widening DAG keyed by a smallCoercionTagenum (matches v2 widening semantics —STRING → TIMESTAMPdirect edge,UNDEFINED → any-concreteunit cost, etc).PPLTypeChecker): signatures expressed asList<List<RelDataType>>; newrenderTypeNamefor error messages; newtemporalKindhelper that canonicalizes DATE/TIME/TIMESTAMP across UDT and standard SQL temporal forms in a single place.PPLOperandTypes,UDFOperandMetadata): signature constants exposed asRelDataType(e.g.INTEGER_T,STRING_T,IP_UDT).AddSubDate,Extract,Format,LastDay,PeriodName,TimestampAdd,TimestampDiff,Weekday,Span,WidthBucket,IPFunction,CidrMatchFunction,CompareIpFunction, geo-IP,CurrentFunction): branch onRelDataType(UDT class instances orSqlTypeName) and passRelDataTypethrough their lowering, instead of bouncing throughExprType.CalciteRexNodeVisitor.visitCast: maps ASTDataTypedirectly toRelDataType, removing theDataType.getCoreType()round-trip.ExtendedRexBuilder: type discrimination viainstanceofUDT classes andSqlTypeName, noExprTypeconversion.UDT identity is preserved at planning time via
instanceofchecks (notgetExprType()); subclasses survivecreateTypeWithNullability/createTypeWithCharsetAndCollationvia acloneWithhook onExprSqlType/ExprJavaType.2. Remove DATE/TIME/TIMESTAMP UDTs from the logical plan
After the
ExprTypedecoupling lands, the only thing pinning UDT temporal types to the logical layer wasconvertExprTypeToRelDataType. This change pushes those UDTs out of the logical plan entirely:OpenSearchTypeFactory.convertExprTypeToRelDataTypereturns standardDATE/TIME(9)/TIMESTAMP(9)for the correspondingExprCoreType. NewisStandardTemporalTypehelper. IP and BINARY remain UDT.CalciteRexNodeVisitor.visitCast): emits standard temporal types for ASTDATE/TIME/TIMESTAMPcast targets.ExtendedRexBuilder.makeCast): standard temporal targets dispatch toPPLBuiltinOperators.DATE/TIME/TIMESTAMPUDFs but keep the call'sRelDataTypestandard. IP routed separately.PredicateAnalyzer):isTimestamp/isDateaccept both standardSqlTypeNameand UDT, and read literal values viagetValueAs(String.class)soTimestampString/DateStringround-trip withoutClassCastException.NULLABLE_*_UDT→NULLABLE_*_T,DATE_UDT/TIME_UDT/TIMESTAMP_UDT→DATE_T/TIME_T/TIMESTAMP_T.DatetimeExtension/DatetimeUdtNormalizeRuleremoved (introduced in Normalize datetime types for unified query API #5408). Their sole purpose was to rewrite UDT temporal return types to standard Calcite types as a unified-API post-analysis pass; the analysis pipeline no longer produces temporal UDTs in the first place, so the rule is a no-op and the extension is removed along with its registrations inUnifiedPplSpec/UnifiedSqlSpec.TemporalUdtRewriteShuttle(the conversion boundary)A new
RelShuttleruns once at the prepare-statement boundary (OpenSearchCalcitePreparingStmt.implementandOpenSearchRelRunners.run) and rewrites every standard temporal type in the tree back to its UDT counterpart. Implementation notes:RelShuttleImpl.visitChildrebuilds parents viaparent.copy(traits, newInputs)after visiting each child, which makes both rewrite-then-visit and visit-then-rewrite orderings briefly violate row-type consistency. The shuttle interceptsProject/Filter/Calc/Aggregate/Valuesand rebuilds them in one shot with rewritten inputs and rewritten RexNodes together.Filterrebuild preservesvariablesSet. Without this, correlated subqueries lose theirCorrelationIdbinding.RexShuttleextensions:visitSubQueryrecurses into the inner plan (Calcite's default doesn't);visitLambda/visitLambdaRefrebuild the lambda body and parameter refs together sotransform(arr, x -> ...)over temporal-element arrays stays consistent.rewriteTyperecurses intoARRAY/MULTISET/MAP/struct, so collection-typed temporal columns (e.g.TAKE(@timestamp, 2)) get rewritten.TemporalSchemaRewritablemarker interface.AbstractCalciteIndexScanimplements it and rewrites its schema in place viabuildScan(...), rather than wrapping in aLogicalProject(CAST(...)). This is what keeps Linq4j codegen readingStringvalues consistent with whatOpenSearchExprValueFactorydelivers at runtime.Other adjustments
explain_appendpipe_command.json× 2 profiles): logical plans now showTIMESTAMP(9)instead ofEXPR_TIMESTAMP VARCHAR; physical plans still show UDT post-shuttle.Tests
TemporalUdtRewriteShuttleTest,CalcitePPLLogicalPlanStandardTemporalTest(proves no temporal UDT appears anywhere in the logical RelNode / RexNode tree for representative queries — table scan, eval-cast, filter, group-by, UDF call), expandedCoercionUtilsTest, expandedOpenSearchTypeFactoryTest.*Calcite*IT) passes: 4968 tests, 0 failures, 0 errors.Related Issues
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.