Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to provide partition spec in spark ADD_FILES procedure #12325

Open
2 of 3 tasks
bharos opened this issue Feb 19, 2025 · 4 comments · May be fixed by #12327
Open
2 of 3 tasks

Add option to provide partition spec in spark ADD_FILES procedure #12325

bharos opened this issue Feb 19, 2025 · 4 comments · May be fixed by #12327
Labels
improvement PR that improves existing functionality

Comments

@bharos
Copy link

bharos commented Feb 19, 2025

Feature Request / Improvement

Currently, the ADD_FILES API in Apache Iceberg does not support specifying a partition spec, meaning that the API always operates on the latest table spec when adding files, as shown in the implementation.

This can become problematic when the table or partition spec has evolved over time. For instance, in an archival and restore tool where data was archived before the partition spec changed, it would be beneficial to restore archived data using the older partition spec, rather than the current one.

I am working on a PR to address this and would appreciate any specific suggestions or concerns from the community on making this change.

Query engine

Spark

Willingness to contribute

  • I can contribute this improvement/feature independently
  • I would be willing to contribute this improvement/feature with guidance from the Iceberg community
  • I cannot contribute this improvement/feature at this time
@bharos bharos added the improvement PR that improves existing functionality label Feb 19, 2025
@RussellSpitzer
Copy link
Member

Please see #12319

@bharos
Copy link
Author

bharos commented Feb 19, 2025

@RussellSpitzer thanks, I see your change addresses the spec finding in SparkTableUtil.importSparkTable method

I just created the PR for adding the spec as argument to add_files here #12327

Can you review my PR as I find that this change is still needed, to allow passing spec for the FileTable usecase.
I am adding the partition_spec_version option to be only used for the FileTable and it will be a no-op for the SourceTable case

@RussellSpitzer
Copy link
Member

I think I want to do something similar here, Instead of passing in the partition spec can we just search the Iceberg table to see if a valid spec exists that matches the FileTable?

@bharos
Copy link
Author

bharos commented Feb 19, 2025

Makes sense to avoid passing the argument if possible. I'll update the PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement PR that improves existing functionality
Projects
None yet
2 participants