-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for Identity Columns in Apache Iceberg #12297
Labels
improvement
PR that improves existing functionality
Comments
I'm not quite sure how "proposed implementation" actually would work. We
probably need some actual details here. I would recommend starting a new
design doc. The proposal should also integrate with the already existing
concept of https://iceberg.apache.org/spec/#identifier-field-ids .
The concept of auto/incrementing or generating of values for rows probably
also needs considerably more discussion and probably it's own design doc.
…On Mon, Feb 17, 2025 at 1:48 AM Nguyễn Quốc Vương ***@***.***> wrote:
Feature Request / Improvement
*Summary*:
Apache Iceberg should support identity columns similar to Delta Lake. This
feature would allow users to define identity columns in Iceberg tables,
where unique values are automatically generated when not explicitly
provided during writes.
*Motivation*:
Currently, Apache Iceberg does not provide built-in support for identity
columns. In contrast, Delta Lake allows defining identity columns that
generate unique values when users do not explicitly provide them. This
feature simplifies the handling of primary keys and auto-incrementing IDs
in use cases such as:
-
Maintaining unique row identifiers in tables without requiring
external sequence management.
-
Enabling better support for incremental ingestion scenarios where
records require unique IDs.
-
Reducing complexity for users transitioning from traditional databases
that support auto-incrementing primary keys.
*Proposed Implementation*:
-
Introduce a new table property (e.g., identity.column=true) to enable
identity columns on specific fields.
-
Define syntax for identity column declaration during table creation
(e.g., CREATE TABLE ... (id BIGINT IDENTITY, name STRING)).
-
Implement automatic value generation for identity columns when an
explicit value is not provided.
-
Ensure compatibility with Iceberg’s partitioning, snapshot isolation,
and metadata management.
*Alternatives Considered*:
-
Using externally managed sequences or UUIDs, but these approaches
introduce additional complexity and overhead.
-
Leveraging application-side logic to generate unique values, which is
not as efficient as native support.
Query engine
None
Willingness to contribute
- I can contribute this improvement/feature independently
- I would be willing to contribute this improvement/feature with
guidance from the Iceberg community
- I cannot contribute this improvement/feature at this time
—
Reply to this email directly, view it on GitHub
<#12297>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADE2YJZCUQUQVJCKM2CTAD2QGH6RAVCNFSM6AAAAABXIVLCDSVHI2DSMVQWIX3LMV43ASLTON2WKOZSHA2TOMBUGU4DGNQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
[image: nqvuong1998]*nqvuong1998* created an issue (apache/iceberg#12297)
<#12297>
Feature Request / Improvement
*Summary*:
Apache Iceberg should support identity columns similar to Delta Lake. This
feature would allow users to define identity columns in Iceberg tables,
where unique values are automatically generated when not explicitly
provided during writes.
*Motivation*:
Currently, Apache Iceberg does not provide built-in support for identity
columns. In contrast, Delta Lake allows defining identity columns that
generate unique values when users do not explicitly provide them. This
feature simplifies the handling of primary keys and auto-incrementing IDs
in use cases such as:
-
Maintaining unique row identifiers in tables without requiring
external sequence management.
-
Enabling better support for incremental ingestion scenarios where
records require unique IDs.
-
Reducing complexity for users transitioning from traditional databases
that support auto-incrementing primary keys.
*Proposed Implementation*:
-
Introduce a new table property (e.g., identity.column=true) to enable
identity columns on specific fields.
-
Define syntax for identity column declaration during table creation
(e.g., CREATE TABLE ... (id BIGINT IDENTITY, name STRING)).
-
Implement automatic value generation for identity columns when an
explicit value is not provided.
-
Ensure compatibility with Iceberg’s partitioning, snapshot isolation,
and metadata management.
*Alternatives Considered*:
-
Using externally managed sequences or UUIDs, but these approaches
introduce additional complexity and overhead.
-
Leveraging application-side logic to generate unique values, which is
not as efficient as native support.
Query engine
None
Willingness to contribute
- I can contribute this improvement/feature independently
- I would be willing to contribute this improvement/feature with
guidance from the Iceberg community
- I cannot contribute this improvement/feature at this time
—
Reply to this email directly, view it on GitHub
<#12297>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADE2YJZCUQUQVJCKM2CTAD2QGH6RAVCNFSM6AAAAABXIVLCDSVHI2DSMVQWIX3LMV43ASLTON2WKOZSHA2TOMBUGU4DGNQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Feature Request / Improvement
Summary:
Apache Iceberg should support identity columns similar to Delta Lake. This feature would allow users to define identity columns in Iceberg tables, where unique values are automatically generated when not explicitly provided during writes.
Motivation:
Currently, Apache Iceberg does not provide built-in support for identity columns. In contrast, Delta Lake allows defining identity columns that generate unique values when users do not explicitly provide them. This feature simplifies the handling of primary keys and auto-incrementing IDs in use cases such as:
Maintaining unique row identifiers in tables without requiring external sequence management.
Enabling better support for incremental ingestion scenarios where records require unique IDs.
Reducing complexity for users transitioning from traditional databases that support auto-incrementing primary keys.
Proposed Implementation:
Introduce a new table property (e.g., identity.column=true) to enable identity columns on specific fields.
Define syntax for identity column declaration during table creation (e.g., CREATE TABLE ... (id BIGINT IDENTITY, name STRING)).
Implement automatic value generation for identity columns when an explicit value is not provided.
Ensure compatibility with Iceberg’s partitioning, snapshot isolation, and metadata management.
Alternatives Considered:
Using externally managed sequences or UUIDs, but these approaches introduce additional complexity and overhead.
Leveraging application-side logic to generate unique values, which is not as efficient as native support.
Additional Context:
Delta Lake’s identity column feature is described here. A similar implementation in Iceberg would improve usability and adoption.
Query engine
Willingness to contribute
The text was updated successfully, but these errors were encountered: