Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-1842] [Feature] Support on_schema_change='sync_all_columns' for Delta tables #509

Open
3 tasks done
jeremyyeo opened this issue Jan 17, 2023 · 5 comments · May be fixed by dbt-labs/dbt-spark#1088
Open
3 tasks done
Labels
help_wanted Extra attention is needed pkg:dbt-spark Issue affects dbt-spark type:enhancement New feature request

Comments

@jeremyyeo
Copy link

jeremyyeo commented Jan 17, 2023

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt-spark functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Delta Lake 2.0 supports dropping of columns on delta tables if the table has certain tblproperties (https://delta.io/blog/2022-08-29-delta-lake-drop-column/) so we may want to support the on_schema_change = 'sync_all_columns' config when columns are removed from the source (compared to the target).

image

image

Describe alternatives you've considered

A user could probably achieve this today by rewriting some of the relevant built-in macros.

Who will this benefit?

Previously a schema change (specifically a column removed from source vs target) will result in an exception:

https://github.com/dbt-labs/dbt-spark/blob/5ca20be56ec2d557b4fff5e42c320949040650d3/dbt/include/spark/macros/adapters.sql#L280

Primarily this will bring it up to par with other adapters's behaviour for on_schema_change without having users to implement their own overrides for the relevant helper macros (e.g. alter_relation_add_remove_columns() and family).

Are you interested in contributing this feature?

Sure

Anything else?

We did not support this behaviour back in dbt-labs/dbt-spark#229 because delta could not drop columns - this is now supported (albeit with the necessary tblproperties applied).

Probably the way to do this is to retrieve the tblproperties of the target - and then decide whether to raise or not (or perhaps warn with a suggestion "in order to drop columns, please alter the tblproperties, ...".

We probably want this in dbt-databricks too. The behaviour in the adapter is the same as it is here.

@jeremyyeo jeremyyeo added type:enhancement New feature request triage:product In Product's queue labels Jan 17, 2023
@github-actions github-actions bot changed the title [Feature] Support on_schema_change='sync_all_columns' for Delta tables [CT-1842] [Feature] Support on_schema_change='sync_all_columns' for Delta tables Jan 17, 2023
@jtcohen6
Copy link
Contributor

jtcohen6 commented Jan 18, 2023

Delta Lake 2.0 supports dropping of columns on delta tables if the table has certain tblproperties (https://delta.io/blog/2022-08-29-delta-lake-drop-column/)

Good find!

I was just converting these tests (dbt-labs/dbt-spark#593), and remembering the hard way that Delta doesn't (by default) support removing columns.

I'll queue up for discussion with the relevant team. Depending on other commitments, we may want to mark this one as help_wanted.

@jtcohen6 jtcohen6 removed the triage:product In Product's queue label Jan 18, 2023
@nathaniel-may nathaniel-may added the help_wanted Extra attention is needed label Jan 19, 2023
@github-actions
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

@github-actions github-actions bot added the Stale Mark an issue or PR as stale, to be closed label Jul 19, 2023
@data-blade
Copy link

still relevant for us

@github-actions github-actions bot removed the Stale Mark an issue or PR as stale, to be closed label Jul 20, 2023
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the Stale Mark an issue or PR as stale, to be closed label Jul 14, 2024
@data-blade
Copy link

still relevant for us

@github-actions github-actions bot removed the Stale Mark an issue or PR as stale, to be closed label Jul 18, 2024
@mikealfare mikealfare added the pkg:dbt-spark Issue affects dbt-spark label Jan 13, 2025
@mikealfare mikealfare transferred this issue from dbt-labs/dbt-spark Jan 13, 2025
mikealfare pushed a commit that referenced this issue Jan 14, 2025
* convert test_store_test_failures to functional test

* temp update dev reqs

* remove dev requirements override
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help_wanted Extra attention is needed pkg:dbt-spark Issue affects dbt-spark type:enhancement New feature request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants