Add option to provide partition spec in spark ADD_FILES procedure #12325

bharos · 2025-02-19T05:41:27Z

Feature Request / Improvement

Currently, the ADD_FILES API in Apache Iceberg does not support specifying a partition spec, meaning that the API always operates on the latest table spec when adding files, as shown in the implementation.

This can become problematic when the table or partition spec has evolved over time. For instance, in an archival and restore tool where data was archived before the partition spec changed, it would be beneficial to restore archived data using the older partition spec, rather than the current one.

I am working on a PR to address this and would appreciate any specific suggestions or concerns from the community on making this change.

Query engine

Spark

Willingness to contribute

I can contribute this improvement/feature independently
I would be willing to contribute this improvement/feature with guidance from the Iceberg community
I cannot contribute this improvement/feature at this time

…pache#12325)

RussellSpitzer · 2025-02-19T07:10:48Z

Please see #12319

…pache#12325)

bharos · 2025-02-19T07:44:01Z

@RussellSpitzer thanks, I see your change addresses the spec finding in SparkTableUtil.importSparkTable method

I just created the PR for adding the spec as argument to add_files here #12327

Can you review my PR as I find that this change is still needed, to allow passing spec for the FileTable usecase.
I am adding the partition_spec_version option to be only used for the FileTable and it will be a no-op for the SourceTable case

RussellSpitzer · 2025-02-19T16:49:32Z

I think I want to do something similar here, Instead of passing in the partition spec can we just search the Iceberg table to see if a valid spec exists that matches the FileTable?

bharos · 2025-02-19T17:03:03Z

Makes sense to avoid passing the argument if possible. I'll update the PR

bharos added the improvement PR that improves existing functionality label Feb 19, 2025

bharos added a commit to bharos/iceberg that referenced this issue Feb 19, 2025

Spark: Add option to provide partition spec in ADD_FILES procedure (a…

c55382b

…pache#12325)

bharos added a commit to bharos/iceberg that referenced this issue Feb 19, 2025

Spark: Add option to provide partition spec in ADD_FILES procedure (a…

156dca8

…pache#12325)

bharos added a commit to bharos/iceberg that referenced this issue Feb 19, 2025

Spark: Add option to provide partition spec in ADD_FILES procedure (a…

185bd0d

…pache#12325)

bharos added a commit to bharos/iceberg that referenced this issue Feb 19, 2025

Spark: Add option to provide partition spec in ADD_FILES procedure (a…

48a9696

…pache#12325)

bharos added a commit to bharos/iceberg that referenced this issue Feb 19, 2025

Spark: Add option to provide partition spec in ADD_FILES procedure (a…

fc4c5cc

…pache#12325)

bharos linked a pull request Feb 19, 2025 that will close this issue

Spark: Infer partition spec in ADD_FILES procedure for FileTables than taking latest table spec #12327

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to provide partition spec in spark ADD_FILES procedure #12325

Add option to provide partition spec in spark ADD_FILES procedure #12325

bharos commented Feb 19, 2025 •

edited

Loading

RussellSpitzer commented Feb 19, 2025

bharos commented Feb 19, 2025

RussellSpitzer commented Feb 19, 2025

bharos commented Feb 19, 2025

Add option to provide partition spec in spark ADD_FILES procedure #12325

Add option to provide partition spec in spark ADD_FILES procedure #12325

Comments

bharos commented Feb 19, 2025 • edited Loading

Feature Request / Improvement

Query engine

Willingness to contribute

RussellSpitzer commented Feb 19, 2025

bharos commented Feb 19, 2025

RussellSpitzer commented Feb 19, 2025

bharos commented Feb 19, 2025

bharos commented Feb 19, 2025 •

edited

Loading