-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parquet: Implement Variant writers #12323
base: main
Are you sure you want to change the base?
Conversation
360f531
to
086a16c
Compare
@@ -85,6 +85,16 @@ public static ShreddedObject object(VariantMetadata metadata) { | |||
return new ShreddedObject(metadata); | |||
} | |||
|
|||
public static ShreddedObject object(VariantObject object) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used to create a shredded object from an existing object when writing. It uses the object's metadata.
This avoids exposing VariantObject.metadata
because metadata is carried by Variant
instead of values.
@@ -62,7 +71,7 @@ protected ParquetValueWriter<?> timestampWriter(ColumnDescriptor desc, boolean i | |||
} | |||
} | |||
|
|||
private class WriteBuilder extends ParquetTypeVisitor<ParquetValueWriter<?>> { | |||
private class WriteBuilder extends TypeWithSchemaVisitor<ParquetValueWriter<?>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to detect a variant type, this needs to use the original Iceberg schema. When Parquet exposes the VARIANT
logical type annotation, we can update this to no longer require the schema.
@@ -192,6 +194,11 @@ public WriteBuilder schema(Schema newSchema) { | |||
return this; | |||
} | |||
|
|||
public WriteBuilder variantShreddingFunc(BiFunction<Integer, String, Type> func) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a function passed to schema conversion. Variants (field ID and name) are passed to this function to determine the shredded type (typed_value
). I'm using a callback function to avoid exposing a way to set the Parquet schema directly here.
private String name = "table"; | ||
private WriteSupport<?> writeSupport = null; | ||
private Function<MessageType, ParquetValueWriter<?>> createWriterFunc = null; | ||
private BiFunction<Schema, MessageType, ParquetValueWriter<?>> createWriterFunc = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is supports passing the schema to the write builder function.
@@ -71,6 +71,10 @@ public static UnboxedWriter<Short> shorts(ColumnDescriptor desc) { | |||
return new ShortWriter(desc); | |||
} | |||
|
|||
public static <T> ParquetValueWriter<T> unboxed(ColumnDescriptor desc) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needed to expose this to get float
and double
working. Those writers currently require a metrics builder that expects a non-null field ID.
private final VariantMetadata metadata; | ||
private final VariantValue value; | ||
|
||
VariantData(VariantMetadata metadata, VariantValue value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was originally in #12304, but I moved it here after reverting changes to variant classes in that PR. This uses it to create objects that are passed to writers and created by readers.
67c2bab
to
f4296c3
Compare
Rebased after moving variants to API in #12374. |
f4296c3
to
f3e3ccc
Compare
This PR implements Variant writers for Parquet based on a Parquet schema passed into the writer builder. It works basically the same as #12139.