Skip to content

[deprecation] Block field-level forward encoding in table configs#18668

Open
xiangfu0 wants to merge 1 commit into
apache:masterfrom
xiangfu0:codex/deprecate-fieldconfig-encoding-type
Open

[deprecation] Block field-level forward encoding in table configs#18668
xiangfu0 wants to merge 1 commit into
apache:masterfrom
xiangfu0:codex/deprecate-fieldconfig-encoding-type

Conversation

@xiangfu0
Copy link
Copy Markdown
Contributor

@xiangfu0 xiangfu0 commented Jun 3, 2026

Summary

Deprecates field-level FieldConfig.encodingType and makes fieldConfig.indexes.forward.encodingType the canonical forward-index encoding source.

This change now blocks table creation/update configs that still use fieldConfigList[].encodingType. Validation fails with a targeted message directing users to fieldConfigList[].indexes.forward.encodingType.

Compatibility retained internally:

  • fieldConfig.encodingType still deserializes for backward-compatible FieldConfig reads.
  • FieldConfig#getEncodingType() still returns DICTIONARY when the field is absent for existing callers.
  • Legacy inputs such as noDictionaryColumns, noDictionaryConfig, and field-level encodingType remain fallback signals in index-conversion paths when no indexes.forward.encodingType is present.
  • When indexes.forward.encodingType is present, it is authoritative over legacy no-dictionary and field-level encoding signals.

User Manual

For new or updated table configs, configure forward-index encoding under the internal forward-index block:

{
  "fieldConfigList": [
    {
      "name": "message",
      "indexes": {
        "forward": {
          "encodingType": "RAW",
          "compressionCodec": "ZSTANDARD"
        }
      }
    }
  ]
}

Do not use the deprecated field-level encoding form:

{
  "fieldConfigList": [
    {
      "name": "message",
      "encodingType": "RAW"
    }
  ]
}

That form now fails table validation with:

FieldConfig.encodingType is deprecated for column: message. Use fieldConfigList[].indexes.forward.encodingType instead.

Sample Table Config

{
  "tableName": "logs_OFFLINE",
  "tableType": "OFFLINE",
  "segmentsConfig": {
    "schemaName": "logs"
  },
  "tableIndexConfig": {},
  "fieldConfigList": [
    {
      "name": "logLine",
      "indexes": {
        "forward": {
          "encodingType": "RAW",
          "compressionCodec": "ZSTANDARD"
        },
        "text": {}
      }
    },
    {
      "name": "statusCode",
      "indexes": {
        "forward": {
          "encodingType": "DICTIONARY"
        },
        "inverted": {}
      }
    }
  ]
}

Validation

  • ./mvnw -pl pinot-segment-local -am -Dtest=TableConfigUtilsTest,TableConfigConsumingSegmentTierOverrideTest -Dsurefire.failIfNoSpecifiedTests=false test
  • ./mvnw -pl pinot-segment-local -am -Dtest=TableConfigUtilsTest -Dsurefire.failIfNoSpecifiedTests=false test
  • ./mvnw -pl pinot-segment-local -am -Dtest=ForwardIndexTypeTest,DictionaryIndexTypeTest -Dsurefire.failIfNoSpecifiedTests=false test
  • ./mvnw -pl pinot-spi -Dtest=FieldConfigTest test
  • ./mvnw -pl pinot-segment-spi -Dtest=ForwardIndexConfigTest test
  • ./mvnw -pl pinot-spi -Dtest=FieldConfigTest,OpenStructIndexConfigTest test
  • ./mvnw spotless:apply -pl pinot-spi,pinot-segment-local
  • ./mvnw license:format -pl pinot-spi,pinot-segment-local
  • ./mvnw checkstyle:check -pl pinot-spi,pinot-segment-local
  • ./mvnw license:check -pl pinot-spi,pinot-segment-local
  • ./mvnw spotless:apply -pl pinot-spi,pinot-segment-spi,pinot-segment-local
  • ./mvnw checkstyle:check -pl pinot-spi,pinot-segment-spi,pinot-segment-local
  • ./mvnw license:format -pl pinot-spi,pinot-segment-spi,pinot-segment-local
  • ./mvnw license:check -pl pinot-spi,pinot-segment-spi,pinot-segment-local

license:check passes with pre-existing unknown-extension warnings for Lucene binary test resources under pinot-segment-local/src/test/resources/data/lucene_80_index.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jun 3, 2026

Codecov Report

❌ Patch coverage is 76.66667% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 37.12%. Comparing base (9717ba6) to head (3e3615f).
⚠️ Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
.../local/segment/index/forward/ForwardIndexType.java 66.66% 2 Missing and 5 partials ⚠️

❗ There is a different number of reports uploaded between BASE (9717ba6) and HEAD (3e3615f). Click for more details.

HEAD has 3 uploads less than BASE
Flag BASE (9717ba6) HEAD (3e3615f)
unittests1 1 0
unittests 2 1
custom-integration1 1 0
Additional details and impacted files
@@              Coverage Diff              @@
##             master   #18668       +/-   ##
=============================================
- Coverage     64.47%   37.12%   -27.36%     
+ Complexity     1291     1290        -1     
=============================================
  Files          3372     3372               
  Lines        208584   208598       +14     
  Branches      32574    32573        -1     
=============================================
- Hits         134494    77435    -57059     
- Misses        63294   124116    +60822     
+ Partials      10796     7047     -3749     
Flag Coverage Δ
custom-integration1 ?
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-21 37.12% <76.66%> (-27.36%) ⬇️
temurin 37.12% <76.66%> (-27.36%) ⬇️
unittests 37.11% <76.66%> (-27.36%) ⬇️
unittests1 ?
unittests2 37.11% <76.66%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@xiangfu0 xiangfu0 added configuration Config changes (addition/deletion/change in behavior) deprecation Marks deprecated APIs, configs, or features labels Jun 3, 2026
@xiangfu0 xiangfu0 changed the title [codex] Deprecate field-level forward encoding [deprecation] Deprecate field-level forward encoding Jun 3, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@xiangfu0 xiangfu0 force-pushed the codex/deprecate-fieldconfig-encoding-type branch from 3b9a7b8 to 69c7124 Compare June 3, 2026 21:39
@xiangfu0 xiangfu0 changed the title [deprecation] Deprecate field-level forward encoding [deprecation] Block field-level forward encoding in table configs Jun 3, 2026
@xiangfu0 xiangfu0 marked this pull request as ready for review June 5, 2026 22:34
Copy link
Copy Markdown
Contributor Author

@xiangfu0 xiangfu0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — reviewed at current HEAD 69c7124

Thanks — the demotion of field-level encodingType in favor of indexes.forward.encodingType is a clean direction and the serialization handling is solid. One backward-compatibility concern stands out that I think needs a decision before merge; the rest are minor.

🔴 MAJOR — validate() now hard-rejects field-level encodingType; this contradicts the PR description and has no migration path

TableConfigUtils.validateIndexingConfigAndFieldConfigList now does:

Preconditions.checkState(!fieldConfig.hasFieldLevelEncodingType(),
    "FieldConfig.encodingType is deprecated for column: %s. Use fieldConfigList[].indexes.forward.encodingType instead.", column);

hasFieldLevelEncodingType() returns _encodingType != null, i.e. it fires whenever a config explicitly sets the long-standing, documented field-level form:

{ "name": "message", "encodingType": "RAW" }

I checked the controller write path: TableConfigUtils.validate(...) runs on the as-submitted config at table create/update (PinotTableRestletResource#868, TableConfigsRestletResource#594/604), and there is no auto-migration before it — createTableConfigFromOldFormat / convertFromLegacyTableConfig have no controller callers (test-only). So:

  • Creating a table with field-level encodingType is now rejected.
  • Editing an existing table whose stored config has field-level encodingType (re-submitting the full config) is rejected until the user hand-migrates every such column into indexes.forward.

This directly contradicts the PR description, which says "Existing configs using top-level encodingType continue to work, but that field is now deprecated" and shows {"name":"message","encodingType":"RAW"} as a still-valid example. As written, the field isn't deprecated — it's rejected. (Running clusters keep working since stored configs aren't re-validated on read; the break is on the create/update/validation path.)

Recommendation — pick one and make the PR consistent with it:

  • Backward-compatible (preferred): auto-migrate field-level encodingTypeindexes.forward.encodingType on the config-write path (reuse the withForwardEncoding logic), then validate. Existing tables keep working transparently and the legacy field is genuinely "deprecated".
  • Intentional hard break: then add the backward-incompat/release-notes label + rolling-upgrade notes, correct the PR description (drop "continues to work"), and reconcile the now-mostly-dead legacy fallback in ForwardIndexType.createDeserializer (fieldConfig.getEncodingType() can no longer return an explicit value for a validated config).

✅ Resolved / looks good

  • noDictionaryColumns + forward=DICTIONARY is now an explicit, tested validation error (Dictionary must be enabled for dictionary-encoded forward index column). The forward and dictionary resolvers disagree on precedence for that contradictory combo, but validation now catches it deterministically — good.
  • Serialization is safe across versions: omitting encodingType no longer emits a spurious "encodingType":"DICTIONARY"; an old node defaults missing→DICTIONARY identically. FieldConfigTest covers both directions including hasFieldLevelEncodingType().
  • The RAW direction is fully coherent — both ForwardIndexType and DictionaryIndexType honor forward=RAW, so a RAW forward index correctly disables the dictionary.
  • ArrayListLinkedHashSet for configsToUpdate is a real robustness fix; withForwardEncoding/getForwardEncodingType handle NullNode/null correctly.

🟡 Minor

  1. Duplication of the forward-block parse/write. DictionaryIndexType.getForwardEncodingType re-implements the indexes.forward.encodingType lookup that ForwardIndexType.createDeserializer already does, plus there are now multiple withForwardEncoding helpers (main + test) and hardcoded "encodingType"/"forward" literals. A single shared accessor + constant would keep the precedence rule single-sourced. (Inline.)
  2. equals()/hashCode() semantics (BaseJsonConfig is toJsonNode-based): a FieldConfig with absent encodingType is no longer equal to one with explicit encodingType:DICTIONARY (previously both normalized to DICTIONARY). Low risk since explicit values round-trip unchanged, but worth confirming no controller "config changed?" diff relies on the old normalization.
  3. OpenStructIndexConfig.shouldUseDictionaryForKey still reads the deprecated field-level getEncodingType(); with field-level encoding now forbidden by validation, it can only ever observe the DICTIONARY default and won't see a key column configured RAW via the forward block. Pre-existing and not touched here, but the gap widens under this PR — worth a follow-up.

(Note: I reviewed the previous HEAD earlier; the branch advanced to 69c7124 mid-review, which already addressed the earlier dictionary/forward precedence concern. The above reflects the current diff.)

String column = fieldConfig.getName();
Preconditions.checkState(seenColumns.add(column), "Duplicate FieldConfig for column: %s", column);
Preconditions.checkState(schema.hasColumn(column), "Failed to find column: %s in schema", column);
Preconditions.checkState(!fieldConfig.hasFieldLevelEncodingType(),
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backward-compat: this hard-rejects the legacy field-level encodingType with no migration path.

hasFieldLevelEncodingType() is true whenever encodingType is explicitly set (_encodingType != null), so a long-standing config like {"name":"col","encodingType":"RAW"} now fails validation. TableConfigUtils.validate runs on the as-submitted config at create/update (PinotTableRestletResource#868, TableConfigsRestletResource#594/604) and nothing migrates field-level→forward beforehand (createTableConfigFromOldFormat/convertFromLegacyTableConfig are test-only). So editing any existing table that uses field-level encodingType is blocked until the user hand-migrates each column.

This contradicts the PR description ("Existing configs using top-level encodingType continue to work"). Suggest either auto-migrating field-level→indexes.forward.encodingType on the write path before validation (true deprecation, no user impact), or — if the hard break is intended — labeling it backward-incompat/release-notes, adding rolling-upgrade notes, and updating the description.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional hard break: table create/update should reject field-level fieldConfigList[].encodingType and direct users to fieldConfigList[].indexes.forward.encodingType. I updated the PR description and README/manual text to state that explicitly, including the validation error and sample config. The legacy field still deserializes for read/conversion paths, but submitted table configs are expected to migrate before create/update validation.

return indexes != null && indexes.isObject() && indexes.has(StandardIndexes.DICTIONARY_ID);
}

private static FieldConfig.EncodingType getForwardEncodingType(FieldConfig fieldConfig) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor — single-source the forward-encoding resolution.

This helper re-implements the indexes.forward.encodingType parse that ForwardIndexType.createDeserializer already does, and the JSON keys "forward"/"encodingType" are hardcoded here, in withForwardEncoding below, and in the test helpers. If the precedence/shape ever changes, these can drift. Consider one shared accessor (e.g. on FieldConfig or a forward-index util) plus a constant for the "encodingType" key, used by both resolvers.

Also note the fallback return fieldConfig.getEncodingType(); here is now mostly unreachable for explicit values once validate() forbids field-level encoding — worth reconciling with that validation so the legacy path isn't half-alive.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the amended head 3e3615f: ForwardIndexType now owns ENCODING_TYPE_CONFIG_KEY, getForwardEncodingType(FieldConfig), and withForwardEncoding(...); DictionaryIndexType uses those instead of re-parsing/writing indexes.forward.encodingType locally. The legacy fallback remains centralized there for conversion paths that still inspect configs before validation.

@xiangfu0 xiangfu0 force-pushed the codex/deprecate-fieldconfig-encoding-type branch from 69c7124 to 8eb6cd7 Compare June 5, 2026 23:49
@xiangfu0 xiangfu0 force-pushed the codex/deprecate-fieldconfig-encoding-type branch from 8eb6cd7 to 3e3615f Compare June 6, 2026 02:29
@xiangfu0 xiangfu0 added backward-incompat Introduces a backward-incompatible API or behavior change release-notes Referenced by PRs that need attention when compiling the next release notes labels Jun 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backward-incompat Introduces a backward-incompatible API or behavior change configuration Config changes (addition/deletion/change in behavior) deprecation Marks deprecated APIs, configs, or features release-notes Referenced by PRs that need attention when compiling the next release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants