Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DDL statements to drop branches and tags #23614

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

agrawalreetika
Copy link
Member

@agrawalreetika agrawalreetika commented Sep 10, 2024

Description

Add DDL statements to drop branches and tags

Motivation and Context

Resolves #22028

Impact

Resolves #22028

SQL support for dropping a branch from a table :

ALTER TABLE users DROP BRANCH 'branch1';

SQL support for dropping a tag from a table :

ALTER TABLE users DROP TAG 'tag1';

Test Plan

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* DDL support for dropping a branch from a table. 
* DDL support for dropping a tag from a table.

Iceberg Connector Changes
* Support for dropping a branch from an Iceberg table.
* Support for dropping a branch from an Iceberg table.

Copy link

github-actions bot commented Sep 10, 2024

Codenotify: Notifying subscribers in CODENOTIFY files for diff 003d86a...260b490.

Notify File(s)
@aditi-pandit presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4
@elharo presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4
@kaikalur presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4
@rschlussel presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4

steveburnett
steveburnett previously approved these changes Sep 10, 2024
Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pull branch, local docs build, looks good. Thanks!

Copy link
Contributor

@ZacBlanco ZacBlanco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some high level feedback. Most of the code looks good. I will take a closer look on 2nd pass.

Also, I wanted to bring a thought I had about parser extensions. Since many connectors will not support this branching and tagging, I was thinking that maybe we ought to consider designing a SQL syntax plugin extension interface. Spark allows custom syntax extensions through implementing some set of interfaces or bringing your own parser. The upstream iceberg project now maintains their own spark SQL syntax extensions.

I'm not proposing we need that for this PR, but maybe it's something we should start thinking about if connectors start adding more radically different features that would be best left to some syntax extensions/optional plugins, especially for things outside the SQL specification.

}

@Override
public void checkCanDropTag(ConnectorTransactionHandle transaction, ConnectorIdentity identity, AccessControlContext context, SchemaTableName tableName)
Copy link
Contributor

@ZacBlanco ZacBlanco Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder about the granularity of these methods. In other implementation (e.g. spark?) at what granularity do they enforce the ability to do CRUD operations on tags and branches?

I'm thinking about a few cases

  1. A group or user(s) can only access a certain set of branches or tags
  2. A group or user(s) can only create branches starting from a specific branch
  3. A group or user(s) can create tags

I know we're only implementing DROP but I want to understand the whole story for access control around branches and tags. Would we ever need to pass the branch/tag to these methods?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the granularity of access control methods in Spark for CRUD operations on Iceberg tags and branches is limited by Spark's integration with external systems (such as file systems, catalogs, and security frameworks like Apache Ranger).
For example, Ranger policies can define access controls at the table level, which could be extended to manage specific branches or tag-based access.
And like for cloud-based catalogs like AWS Glue, you can control access to Iceberg metadata (branches and tags) via IAM policies that grant or restrict specific operations.

Copy link
Contributor

@ZacBlanco ZacBlanco Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the info. Would the parameters passed here as context have enough information for us to act at a similar granularity? I don't see anything in the method parameters that contains the branch name which I assume we would need to perform access control at a similar level.

Copy link
Member Author

@agrawalreetika agrawalreetika Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to discuss around this, Systems like Ranger can define access controls at the table level, column level. So in this case I think access of drop branch & tags could be table based. As per I can think of branch & tag level policies then has to be maintained on engine side if we introduce branch name / tag name in here?

Copy link
Member Author

@agrawalreetika agrawalreetika Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tdcmeehan What do you think about access control for branches and tags? Would be based on the parent table itself or based on the tags/branches?
My thinking was that since no policies are enforced based on branch/tags via security frameworks, should this honor the same access policies as table? Or if we don't even need access control exposed for dropTag & dropBranch?

@agrawalreetika agrawalreetika force-pushed the iceberg-tag-branch-drop branch 2 times, most recently from 8f4d3fa to 1e47c05 Compare September 11, 2024 21:18
@agrawalreetika
Copy link
Member Author

@ZacBlanco Can you please take another pass?

Copy link
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole change overall looks good to me. Some little nits, and one problem for discussing about the behavior of if exists on branch and tag.

@agrawalreetika agrawalreetika force-pushed the iceberg-tag-branch-drop branch 3 times, most recently from 0abecd5 to 5af0a0d Compare September 23, 2024 18:54
@agrawalreetika
Copy link
Member Author

@hantangwangd Thanks for you review. I have addressed your comments, please review.

hantangwangd
hantangwangd previously approved these changes Sep 25, 2024
Copy link
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix. LGTM!

ZacBlanco
ZacBlanco previously approved these changes Sep 25, 2024
@agrawalreetika
Copy link
Member Author

Hi @tdcmeehan, This is now waiting for the final committer review. Please review whenever you have time.

Copy link
Contributor

@pratyakshsharma pratyakshsharma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR, minor comments, and rest looks good.


Optional<MaterializedViewDefinition> optionalMaterializedView = metadata.getMetadataResolver(session).getMaterializedView(tableName);
if (optionalMaterializedView.isPresent()) {
if (!statement.isTableExists()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we throw an error in cases when if exists is present in the query and it is a materialized view? If drop branch is just not supported for MVs, ideally error should be thrown irrespective of exists check?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternate way of handling this can be introducing an enum Type in DropBranch and have only TABLE as the supported type.

return immediateFuture(null);
}

ConnectorId connectorId = metadata.getCatalogHandle(session, tableName.getCatalogName())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the variable connectorId is not getting used and can be removed. Also I have refactored and extracted this piece of code in MetadataUtil class in this commit - dec2952.
So we can probably do a similar change here as well so that all the Task classes use the same methods?i

return immediateFuture(null);
}

ConnectorId connectorId = metadata.getCatalogHandle(session, tableName.getCatalogName())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: same as above.


Optional<MaterializedViewDefinition> optionalMaterializedView = metadata.getMetadataResolver(session).getMaterializedView(tableName);
if (optionalMaterializedView.isPresent()) {
if (!statement.isTableExists()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: same as above. What about the case when if exists is present and it is MV?

@agrawalreetika agrawalreetika dismissed stale reviews from ZacBlanco and hantangwangd via 260b490 February 7, 2025 19:34
@agrawalreetika agrawalreetika force-pushed the iceberg-tag-branch-drop branch from 5af0a0d to 260b490 Compare February 7, 2025 19:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add DDL statements to drop branches and tags
6 participants