Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: add variant type support #11831

Merged
merged 11 commits into from
Feb 18, 2025
Merged

Core: add variant type support #11831

merged 11 commits into from
Feb 18, 2025

Conversation

aihuaxu
Copy link
Contributor

@aihuaxu aihuaxu commented Dec 19, 2024

This is to add some required changes in API and core module for Variant support, including:

  • Add isVariantType() method for variant type
  • Add variant support in schema visitors such as AssignIdFreshIds, ReassignIds, PruneColumns, etc. and TypeUtil.
  • Add test coverage to test out schema visitors, TypeUtil.

Part of: #10392

@aihuaxu aihuaxu marked this pull request as ready for review December 19, 2024 23:00
@aihuaxu
Copy link
Contributor Author

aihuaxu commented Dec 19, 2024

cc @rdblue, @RussellSpitzer, @flyrain and @JonasJ-ap. This is to add the changes in core to support variant type.

@aihuaxu aihuaxu requested a review from rdblue December 21, 2024 04:45
@aihuaxu aihuaxu force-pushed the variant-type-core branch 2 times, most recently from b276d3f to fe6038a Compare December 21, 2024 16:42
@shohamyamin
Copy link

@aihuaxu very important feature that will allow a lot more options for iceberg, thank you for your contribution

@aihuaxu aihuaxu force-pushed the variant-type-core branch 3 times, most recently from e0a430f to 1b56bf5 Compare January 30, 2025 05:12
@github-actions github-actions bot added the spark label Feb 17, 2025
@aihuaxu aihuaxu force-pushed the variant-type-core branch 2 times, most recently from 27df15b to 4ed1a35 Compare February 17, 2025 17:15
@@ -33,6 +33,6 @@ class PrimitiveHolder implements Serializable {
}

Object readResolve() throws ObjectStreamException {
return Types.fromPrimitiveString(typeAsString);
return Types.fromTypeName(typeAsString);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is tested by TestSerializableTypes. Can you add new cases for variant (and the other new types, while you're updating it)?

Copy link
Contributor Author

@aihuaxu aihuaxu Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new type tests have been added in TestSerializableTypes. But let me refactor a little bit for testUnknown() which should be tested with isSameAs(), same as other primitive types. testVariant() is testing with isEqualTo().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the test for this is missing variant.

Copy link
Contributor

@rdblue rdblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few minor things to fix, but overall I think this looks good.

@@ -46,6 +46,7 @@ public void testIdentityTypes() throws Exception {
Types.StringType.get(),
Types.UUIDType.get(),
Types.BinaryType.get(),
Types.UnknownType.get()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aihuaxu, why not add variant as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variant test is already added below. Since for variant, we need to test with isEqualTo() instead of isSameAs().

public void testVariant() throws Exception {

@rdblue rdblue merged commit 0b47faa into apache:main Feb 18, 2025
47 checks passed
@Fokko
Copy link
Contributor

Fokko commented Feb 18, 2025

@aihuaxu It looks like we have a sad CI on main: https://github.com/apache/iceberg/actions/runs/13395385725/job/37413381567

rdblue added a commit to rdblue/iceberg that referenced this pull request Feb 18, 2025
@Fokko
Copy link
Contributor

Fokko commented Feb 18, 2025

@aihuaxu @rdblue An unfortunate collision, created a fix here: #12316

rdblue added a commit to rdblue/iceberg that referenced this pull request Feb 18, 2025
ankurbansal-tradedoubler added a commit to ankurbansal-tradedoubler/iceberg that referenced this pull request Feb 19, 2025
* Site: Learn More to point to Spark QuickStart Doc (apache#12272)

* Build: Bump datamodel-code-generator from 0.27.2 to 0.28.1 (apache#12290)

* Spark 3.5: Fix job description of RewriteTablePathSparkAction (apache#12282)

* Build: Bump io.netty:netty-buffer from 4.1.117.Final to 4.1.118.Final (apache#12287)

Bumps [io.netty:netty-buffer](https://github.com/netty/netty) from 4.1.117.Final to 4.1.118.Final.
- [Commits](netty/netty@netty-4.1.117.Final...netty-4.1.118.Final)

---
updated-dependencies:
- dependency-name: io.netty:netty-buffer
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Build: Bump software.amazon.awssdk:bom from 2.30.16 to 2.30.21 (apache#12286)

Bumps software.amazon.awssdk:bom from 2.30.16 to 2.30.21.

---
updated-dependencies:
- dependency-name: software.amazon.awssdk:bom
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* OpenAPI: Add overwrite option when registering a table (apache#12239)

* OpenAPI: Add optional overwrite when registering table

* simplify to overwrite

* Add the article to the description

Co-authored-by: Eduard Tudenhoefner <[email protected]>

* Update generated python as well

Signed-off-by: Hongyue Zhang <[email protected]>

* Fix import order

---------

Signed-off-by: Hongyue Zhang <[email protected]>
Co-authored-by: Eduard Tudenhoefner <[email protected]>

* Build: Bump mkdocs-material from 9.6.3 to 9.6.4 (apache#12284)

Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.6.3 to 9.6.4.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](squidfunk/mkdocs-material@9.6.3...9.6.4)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Core: Fix Enabling row-lineage during Create Table (apache#12307)

* API: Reject unknown type for required fields and validate defaults (apache#12302)

* API: Fix TestInclusiveMetricsEvaluator notStartsWith tests. (apache#12303)

* Core: Add variant type support to utils and visitors (apache#11831)

* Core: Fix CI: Update tests with UnknownType from required to optional (apache#12316)

* Docs: Refactor site navigation bar (apache#12289)

* Parquet: Implement Variant readers (apache#12139)

* Docs: Add rewrite_table_path Spark Procedure (apache#12115)

* Parquet: Fix errorprone warning (apache#12324)

* Docs: Add Apache Amoro docs (apache#11966)

* Parquet: Fix performance regression in reader init (apache#12305)

* Core: Fallback to GET requests for namespace/table/view exists checks (apache#12314)

Co-authored-by: Daniel Weeks <[email protected]>

* Docs: Fix refs in Apache Amoro docs (apache#12332)

* Revert "Core: Serialize `null` when there is no current snapshot (apache#11560)" (apache#12312)

This reverts commit bf8d25f.

* Parquet: Fix performance regression in reader init (apache#12305) (apache#12329)

Co-authored-by: Bryan Keller <[email protected]>

* Checkstyle: Apply the same generic type naming rules to interfaces and classes (apache#12333)

---------

Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Hongyue Zhang <[email protected]>
Co-authored-by: Danica Fine <[email protected]>
Co-authored-by: Manu Zhang <[email protected]>
Co-authored-by: Yuya Ebihara <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Hongyue/Steve Zhang <[email protected]>
Co-authored-by: Eduard Tudenhoefner <[email protected]>
Co-authored-by: Tom Tanaka <[email protected]>
Co-authored-by: Ryan Blue <[email protected]>
Co-authored-by: Aihua Xu <[email protected]>
Co-authored-by: Fokko Driesprong <[email protected]>
Co-authored-by: ConradJam <[email protected]>
Co-authored-by: Bryan Keller <[email protected]>
Co-authored-by: Daniel Weeks <[email protected]>
Co-authored-by: pvary <[email protected]>
@aihuaxu
Copy link
Contributor Author

aihuaxu commented Feb 20, 2025

@aihuaxu It looks like we have a sad CI on main: https://github.com/apache/iceberg/actions/runs/13395385725/job/37413381567

Oh. Seems it conflicts with #12302. @Fokko Thanks for fixing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants