[deprecation] Block field-level forward encoding in table configs#18668
[deprecation] Block field-level forward encoding in table configs#18668xiangfu0 wants to merge 1 commit into
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #18668 +/- ##
=============================================
- Coverage 64.47% 37.12% -27.36%
+ Complexity 1291 1290 -1
=============================================
Files 3372 3372
Lines 208584 208598 +14
Branches 32574 32573 -1
=============================================
- Hits 134494 77435 -57059
- Misses 63294 124116 +60822
+ Partials 10796 7047 -3749
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
3b9a7b8 to
69c7124
Compare
xiangfu0
left a comment
There was a problem hiding this comment.
Review — reviewed at current HEAD 69c7124
Thanks — the demotion of field-level encodingType in favor of indexes.forward.encodingType is a clean direction and the serialization handling is solid. One backward-compatibility concern stands out that I think needs a decision before merge; the rest are minor.
🔴 MAJOR — validate() now hard-rejects field-level encodingType; this contradicts the PR description and has no migration path
TableConfigUtils.validateIndexingConfigAndFieldConfigList now does:
Preconditions.checkState(!fieldConfig.hasFieldLevelEncodingType(),
"FieldConfig.encodingType is deprecated for column: %s. Use fieldConfigList[].indexes.forward.encodingType instead.", column);hasFieldLevelEncodingType() returns _encodingType != null, i.e. it fires whenever a config explicitly sets the long-standing, documented field-level form:
{ "name": "message", "encodingType": "RAW" }I checked the controller write path: TableConfigUtils.validate(...) runs on the as-submitted config at table create/update (PinotTableRestletResource#868, TableConfigsRestletResource#594/604), and there is no auto-migration before it — createTableConfigFromOldFormat / convertFromLegacyTableConfig have no controller callers (test-only). So:
- Creating a table with field-level
encodingTypeis now rejected. - Editing an existing table whose stored config has field-level
encodingType(re-submitting the full config) is rejected until the user hand-migrates every such column intoindexes.forward.
This directly contradicts the PR description, which says "Existing configs using top-level encodingType continue to work, but that field is now deprecated" and shows {"name":"message","encodingType":"RAW"} as a still-valid example. As written, the field isn't deprecated — it's rejected. (Running clusters keep working since stored configs aren't re-validated on read; the break is on the create/update/validation path.)
Recommendation — pick one and make the PR consistent with it:
- Backward-compatible (preferred): auto-migrate field-level
encodingType→indexes.forward.encodingTypeon the config-write path (reuse thewithForwardEncodinglogic), then validate. Existing tables keep working transparently and the legacy field is genuinely "deprecated". - Intentional hard break: then add the
backward-incompat/release-noteslabel + rolling-upgrade notes, correct the PR description (drop "continues to work"), and reconcile the now-mostly-dead legacy fallback inForwardIndexType.createDeserializer(fieldConfig.getEncodingType()can no longer return an explicit value for a validated config).
✅ Resolved / looks good
noDictionaryColumns+forward=DICTIONARYis now an explicit, tested validation error (Dictionary must be enabled for dictionary-encoded forward index column). The forward and dictionary resolvers disagree on precedence for that contradictory combo, but validation now catches it deterministically — good.- Serialization is safe across versions: omitting
encodingTypeno longer emits a spurious"encodingType":"DICTIONARY"; an old node defaults missing→DICTIONARY identically.FieldConfigTestcovers both directions includinghasFieldLevelEncodingType(). - The RAW direction is fully coherent — both
ForwardIndexTypeandDictionaryIndexTypehonorforward=RAW, so a RAW forward index correctly disables the dictionary. ArrayList→LinkedHashSetforconfigsToUpdateis a real robustness fix;withForwardEncoding/getForwardEncodingTypehandleNullNode/null correctly.
🟡 Minor
- Duplication of the forward-block parse/write.
DictionaryIndexType.getForwardEncodingTypere-implements theindexes.forward.encodingTypelookup thatForwardIndexType.createDeserializeralready does, plus there are now multiplewithForwardEncodinghelpers (main + test) and hardcoded"encodingType"/"forward"literals. A single shared accessor + constant would keep the precedence rule single-sourced. (Inline.) equals()/hashCode()semantics (BaseJsonConfigistoJsonNode-based): aFieldConfigwith absentencodingTypeis no longer equal to one with explicitencodingType:DICTIONARY(previously both normalized to DICTIONARY). Low risk since explicit values round-trip unchanged, but worth confirming no controller "config changed?" diff relies on the old normalization.OpenStructIndexConfig.shouldUseDictionaryForKeystill reads the deprecated field-levelgetEncodingType(); with field-level encoding now forbidden by validation, it can only ever observe the DICTIONARY default and won't see a key column configured RAW via the forward block. Pre-existing and not touched here, but the gap widens under this PR — worth a follow-up.
(Note: I reviewed the previous HEAD earlier; the branch advanced to 69c7124 mid-review, which already addressed the earlier dictionary/forward precedence concern. The above reflects the current diff.)
| String column = fieldConfig.getName(); | ||
| Preconditions.checkState(seenColumns.add(column), "Duplicate FieldConfig for column: %s", column); | ||
| Preconditions.checkState(schema.hasColumn(column), "Failed to find column: %s in schema", column); | ||
| Preconditions.checkState(!fieldConfig.hasFieldLevelEncodingType(), |
There was a problem hiding this comment.
Backward-compat: this hard-rejects the legacy field-level encodingType with no migration path.
hasFieldLevelEncodingType() is true whenever encodingType is explicitly set (_encodingType != null), so a long-standing config like {"name":"col","encodingType":"RAW"} now fails validation. TableConfigUtils.validate runs on the as-submitted config at create/update (PinotTableRestletResource#868, TableConfigsRestletResource#594/604) and nothing migrates field-level→forward beforehand (createTableConfigFromOldFormat/convertFromLegacyTableConfig are test-only). So editing any existing table that uses field-level encodingType is blocked until the user hand-migrates each column.
This contradicts the PR description ("Existing configs using top-level encodingType continue to work"). Suggest either auto-migrating field-level→indexes.forward.encodingType on the write path before validation (true deprecation, no user impact), or — if the hard break is intended — labeling it backward-incompat/release-notes, adding rolling-upgrade notes, and updating the description.
There was a problem hiding this comment.
Intentional hard break: table create/update should reject field-level fieldConfigList[].encodingType and direct users to fieldConfigList[].indexes.forward.encodingType. I updated the PR description and README/manual text to state that explicitly, including the validation error and sample config. The legacy field still deserializes for read/conversion paths, but submitted table configs are expected to migrate before create/update validation.
| return indexes != null && indexes.isObject() && indexes.has(StandardIndexes.DICTIONARY_ID); | ||
| } | ||
|
|
||
| private static FieldConfig.EncodingType getForwardEncodingType(FieldConfig fieldConfig) { |
There was a problem hiding this comment.
Minor — single-source the forward-encoding resolution.
This helper re-implements the indexes.forward.encodingType parse that ForwardIndexType.createDeserializer already does, and the JSON keys "forward"/"encodingType" are hardcoded here, in withForwardEncoding below, and in the test helpers. If the precedence/shape ever changes, these can drift. Consider one shared accessor (e.g. on FieldConfig or a forward-index util) plus a constant for the "encodingType" key, used by both resolvers.
Also note the fallback return fieldConfig.getEncodingType(); here is now mostly unreachable for explicit values once validate() forbids field-level encoding — worth reconciling with that validation so the legacy path isn't half-alive.
There was a problem hiding this comment.
Addressed in the amended head 3e3615f: ForwardIndexType now owns ENCODING_TYPE_CONFIG_KEY, getForwardEncodingType(FieldConfig), and withForwardEncoding(...); DictionaryIndexType uses those instead of re-parsing/writing indexes.forward.encodingType locally. The legacy fallback remains centralized there for conversion paths that still inspect configs before validation.
69c7124 to
8eb6cd7
Compare
8eb6cd7 to
3e3615f
Compare
Summary
Deprecates field-level
FieldConfig.encodingTypeand makesfieldConfig.indexes.forward.encodingTypethe canonical forward-index encoding source.This change now blocks table creation/update configs that still use
fieldConfigList[].encodingType. Validation fails with a targeted message directing users tofieldConfigList[].indexes.forward.encodingType.Compatibility retained internally:
fieldConfig.encodingTypestill deserializes for backward-compatibleFieldConfigreads.FieldConfig#getEncodingType()still returnsDICTIONARYwhen the field is absent for existing callers.noDictionaryColumns,noDictionaryConfig, and field-levelencodingTyperemain fallback signals in index-conversion paths when noindexes.forward.encodingTypeis present.indexes.forward.encodingTypeis present, it is authoritative over legacy no-dictionary and field-level encoding signals.User Manual
For new or updated table configs, configure forward-index encoding under the internal forward-index block:
{ "fieldConfigList": [ { "name": "message", "indexes": { "forward": { "encodingType": "RAW", "compressionCodec": "ZSTANDARD" } } } ] }Do not use the deprecated field-level encoding form:
{ "fieldConfigList": [ { "name": "message", "encodingType": "RAW" } ] }That form now fails table validation with:
Sample Table Config
{ "tableName": "logs_OFFLINE", "tableType": "OFFLINE", "segmentsConfig": { "schemaName": "logs" }, "tableIndexConfig": {}, "fieldConfigList": [ { "name": "logLine", "indexes": { "forward": { "encodingType": "RAW", "compressionCodec": "ZSTANDARD" }, "text": {} } }, { "name": "statusCode", "indexes": { "forward": { "encodingType": "DICTIONARY" }, "inverted": {} } } ] }Validation
./mvnw -pl pinot-segment-local -am -Dtest=TableConfigUtilsTest,TableConfigConsumingSegmentTierOverrideTest -Dsurefire.failIfNoSpecifiedTests=false test./mvnw -pl pinot-segment-local -am -Dtest=TableConfigUtilsTest -Dsurefire.failIfNoSpecifiedTests=false test./mvnw -pl pinot-segment-local -am -Dtest=ForwardIndexTypeTest,DictionaryIndexTypeTest -Dsurefire.failIfNoSpecifiedTests=false test./mvnw -pl pinot-spi -Dtest=FieldConfigTest test./mvnw -pl pinot-segment-spi -Dtest=ForwardIndexConfigTest test./mvnw -pl pinot-spi -Dtest=FieldConfigTest,OpenStructIndexConfigTest test./mvnw spotless:apply -pl pinot-spi,pinot-segment-local./mvnw license:format -pl pinot-spi,pinot-segment-local./mvnw checkstyle:check -pl pinot-spi,pinot-segment-local./mvnw license:check -pl pinot-spi,pinot-segment-local./mvnw spotless:apply -pl pinot-spi,pinot-segment-spi,pinot-segment-local./mvnw checkstyle:check -pl pinot-spi,pinot-segment-spi,pinot-segment-local./mvnw license:format -pl pinot-spi,pinot-segment-spi,pinot-segment-local./mvnw license:check -pl pinot-spi,pinot-segment-spi,pinot-segment-locallicense:checkpasses with pre-existing unknown-extension warnings for Lucene binary test resources underpinot-segment-local/src/test/resources/data/lucene_80_index.