Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support FIRST, AFTER and LAST clause when adding a new column in engine and Iceberg connector #20914

Merged
merged 2 commits into from
Jan 17, 2025

Conversation

ebyhr
Copy link
Member

@ebyhr ebyhr commented Mar 4, 2024

Description

I propose adding FIRST, AFTER and LAST options to ALTER TABLE ... ADD COLUMN statement so that we can add new columns and fields to specific positions. LAST is the default when the option isn't provided.

    | ALTER TABLE (IF EXISTS)? tableName=qualifiedName
        ADD COLUMN (IF NOT EXISTS)? column=columnDefinition
        (FIRST | LAST | AFTER after=identifier)? 

Column case:

CREATE TABLE catalog.schema.table(b INTEGER)
ALTER TABLE catalog.schema.table ADD COLUMN a INTEGER FIRST COMMENT 'x' WITH (p = v)
(b INTEGER)
↓
(a INTEGER, b INTEGER)

CREATE TABLE catalog.schema.table(a INTEGER, c INTEGER)
ALTER TABLE catalog.schema.table ADD COLUMN b INTEGER AFTER a COMMENT 'x' WITH (p = v)
(a INTEGER, c INTEGER)
↓
(a INTEGER, b INTEGER, c INTEGER)

Datasources supporting these options:

Fixes #20091

Release notes

(x) Release notes are required, with the following suggested text:

# General, Iceberg
* Add support `FIRST`, `AFTER` and `LAST` clause when adding a new column. ({issue}`20091`)

@cla-bot cla-bot bot added the cla-signed label Mar 4, 2024
@github-actions github-actions bot added docs tests:hive iceberg Iceberg connector delta-lake Delta Lake connector hive Hive connector mongodb MongoDB connector labels Mar 4, 2024
@ebyhr ebyhr self-assigned this Mar 4, 2024
@ebyhr ebyhr force-pushed the ebi/add-column-after branch 2 times, most recently from 4cd3d46 to a111e6d Compare March 4, 2024 06:48
@findinpath
Copy link
Contributor

findinpath commented Mar 4, 2024

From a Zoom Meeting with @martint , @electrum

Support adding a field with ADD COLUMN in Iceberg
#16321

It would be nice to support adding nested fields in ROW types.

Positioning of the fields seems also attractive (FIRST/ AFTER column), but it is not in the spec.
Reference https://iceberg.apache.org/docs/latest/spark-ddl/
Alternative scenario in Trino for specifying the exact position of the new nested field in the row: ALTER COLUMN ... SET DATA TYPE

FIRST/AFTER - let’’s wait for request from the community to actually start any work on this area.

@ebyhr ebyhr force-pushed the ebi/add-column-after branch from a111e6d to 1c8d5d8 Compare March 14, 2024 01:29
@ebyhr ebyhr force-pushed the ebi/add-column-after branch from 1c8d5d8 to 4966026 Compare March 24, 2024 23:21
@ebyhr ebyhr force-pushed the ebi/add-column-after branch from 4966026 to 6308a34 Compare April 15, 2024 07:24
@ebyhr ebyhr marked this pull request as ready for review April 16, 2024 01:40
@ebyhr ebyhr force-pushed the ebi/add-column-after branch from 6308a34 to b3f2425 Compare April 17, 2024 22:18
@ebyhr ebyhr force-pushed the ebi/add-column-after branch from b3f2425 to f2ab14b Compare April 21, 2024 23:37
@ebyhr ebyhr added the stale-ignore Use this label on PRs that should be ignored by the stale bot so they are not flagged or closed. label May 13, 2024
@ebyhr ebyhr force-pushed the ebi/add-column-after branch 4 times, most recently from 952d1a0 to 3685a36 Compare May 20, 2024 08:01
@ebyhr ebyhr requested a review from martint May 21, 2024 01:36
@ebyhr ebyhr force-pushed the ebi/add-column-after branch 4 times, most recently from 2114dec to 8b60661 Compare June 1, 2024 00:58
@ebyhr ebyhr force-pushed the ebi/add-column-after branch from 8b60661 to 7bc3e73 Compare June 23, 2024 23:50
@ebyhr ebyhr force-pushed the ebi/add-column-after branch 2 times, most recently from f9cb8b0 to 21fe7b3 Compare October 3, 2024 07:45
@ebyhr ebyhr force-pushed the ebi/add-column-after branch from 21fe7b3 to 1476046 Compare November 21, 2024 03:46
@mosabua
Copy link
Member

mosabua commented Nov 21, 2024

If we add FIRST .. why not also add LAST?

@mosabua
Copy link
Member

mosabua commented Nov 21, 2024

Also .. is there any chance we implement this later for JDBC connectors and others across the board?

Copy link
Member

@martint martint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The single-level scenario syntax looks good.

The nested case is somewhat confusing. At first glance, it's not clear what y refers to in this statement:

ALTER TABLE t ADD COLUMN a.b.c BIGINT AFTER y

It's meant to be a field within a.b, but that's not immediately obvious. It could, as well, be a top-level column and the statement could be a short hand for creating all the required nested fields with the root a after y.

The following structure would be more natural, but it only works with AFTER and it's incompatible with how nested columns are handled in general:

ALTER TABLE t ADD COLUMN c BIGINT AFTER a.b.y

All this is to say that I need to think more about the nested case.

@mosabua
Copy link
Member

mosabua commented Nov 21, 2024

If we add FIRST .. why not also add LAST?

I guess that already exists.

@ebyhr
Copy link
Member Author

ebyhr commented Nov 21, 2024

Also .. is there any chance we implement this later for JDBC connectors and others across the board?

Yes, we can support this option in JDBC and other connectors if the datasource supports specifying columns positions.

@ebyhr ebyhr force-pushed the ebi/add-column-after branch from 1476046 to 14dc059 Compare November 22, 2024 04:11
@ebyhr
Copy link
Member Author

ebyhr commented Nov 22, 2024

Addressed comments.

@ebyhr ebyhr force-pushed the ebi/add-column-after branch from 14dc059 to 3b94481 Compare November 22, 2024 04:49
@ebyhr ebyhr changed the title Support FIRST and AFTER clause when adding a new column in engine and Iceberg connector Support FIRST, AFTER and LAST clause when adding a new column in engine and Iceberg connector Nov 22, 2024
@ebyhr
Copy link
Member Author

ebyhr commented Nov 27, 2024

FYI, putting fully qualified identifier after ADD COLUMN is same as Spark syntax. I don't stick to this approach, though.

@findinpath findinpath requested a review from martint November 29, 2024 05:37
@ebyhr
Copy link
Member Author

ebyhr commented Dec 17, 2024

We decided to exclude nested fields' support from this PR. I will file an issue and update this PR today or tomorrow.

@ebyhr
Copy link
Member Author

ebyhr commented Dec 18, 2024

(Rebased on master without any changes)

@ebyhr ebyhr force-pushed the ebi/add-column-after branch from 2a810fc to de97c9f Compare December 18, 2024 07:23
@ebyhr ebyhr force-pushed the ebi/add-column-after branch from de97c9f to 0c5c12c Compare January 17, 2025 00:28
@ebyhr ebyhr merged commit d949877 into trinodb:master Jan 17, 2025
97 checks passed
@ebyhr ebyhr deleted the ebi/add-column-after branch January 17, 2025 01:44
@github-actions github-actions bot added this to the 469 milestone Jan 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed delta-lake Delta Lake connector docs hive Hive connector iceberg Iceberg connector mongodb MongoDB connector stale-ignore Use this label on PRs that should be ignored by the stale bot so they are not flagged or closed.
Development

Successfully merging this pull request may close these issues.

[Iceberg] Support ADD Column at a particular index
5 participants