Skip to content

feat: support Iceberg v3 unknown type#662

Draft
manuzhang wants to merge 1 commit into
apache:mainfrom
manuzhang:codex/support-unknown-v3-type
Draft

feat: support Iceberg v3 unknown type#662
manuzhang wants to merge 1 commit into
apache:mainfrom
manuzhang:codex/support-unknown-v3-type

Conversation

@manuzhang
Copy link
Copy Markdown
Member

@manuzhang manuzhang commented May 20, 2026

Closes #665


Summary

  • Add Iceberg v3 unknown primitive type and JSON serialization/deserialization support.
  • Support unknown as null-only data across Arrow, Avro, Parquet, schema projection, and nested fields.
  • Enforce required-field invariants for unknown/null-only projections and Arrow null imports.

Validation

  • ctest --test-dir build --output-on-failure

Co-authored-by: @codex

@manuzhang manuzhang marked this pull request as ready for review May 20, 2026 11:10
Comment thread src/iceberg/schema_util.cc
Comment thread src/iceberg/test/update_schema_test.cc Outdated
Comment thread src/iceberg/parquet/parquet_writer.cc Outdated
@wgtmac
Copy link
Copy Markdown
Member

wgtmac commented May 22, 2026

Please rebase on the latest main as well.

@manuzhang manuzhang force-pushed the codex/support-unknown-v3-type branch 2 times, most recently from eb195ac to d2a34b5 Compare May 22, 2026 09:50
@wgtmac
Copy link
Copy Markdown
Member

wgtmac commented May 23, 2026

Could you please rebase to resolve conflicts? PR for v3 nano precision timestamp types has been merged.

Comment thread src/iceberg/test/schema_json_test.cc Outdated
constexpr std::string_view json =
R"({"fields":[{"id":1,"name":"mysteries","required":false,"type":{"key":"unknown","key-id":2,"type":"map","value":"string","value-id":3,"value-required":false}}],"schema-id":1,"type":"struct"})";

auto schema_result = SchemaFromJson(nlohmann::json::parse(json));
Copy link
Copy Markdown
Collaborator

@zhjwpku zhjwpku May 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the spec: Map keys are required and map values can be either optional or required. So I'm wondering should we report something like Map 'key' can not be unknown type rather than the current error msg?

Comment thread src/iceberg/json_serde.cc Outdated
@wgtmac wgtmac added the awaiting author action Author needs to address comments, resolve conflicts, answer questions, etc. label May 25, 2026
@wgtmac
Copy link
Copy Markdown
Member

wgtmac commented May 28, 2026

@manuzhang Do you have time to revive this?

@manuzhang
Copy link
Copy Markdown
Member Author

manuzhang commented May 28, 2026

@wgtmac Yes, I'm on a trip this week and will update a bit later.

@manuzhang manuzhang force-pushed the codex/support-unknown-v3-type branch 8 times, most recently from 87bfae0 to 34651d7 Compare May 28, 2026 11:59
@manuzhang manuzhang marked this pull request as draft May 28, 2026 12:03
Add an Iceberg unknown primitive type and JSON, Arrow, Avro, Parquet, projection, and data path support for null-only unknown fields. Enforce optionality invariants so required projections cannot be materialized from unknown/null-only fields.

Co-authored-by: Codex <codex@openai.com>

test: cover forbidden nested type promotions

Assert that promotion helpers reject nested type targets for unknown and regular primitive source types.

Co-authored-by: Codex <codex@openai.com>
@manuzhang manuzhang force-pushed the codex/support-unknown-v3-type branch from 34651d7 to 2757796 Compare May 28, 2026 12:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting author action Author needs to address comments, resolve conflicts, answer questions, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support v3 unknown data type

3 participants