-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cherrypick the data rows [deleted or old values] from a past snapshot #12271
Comments
You do need to make a new snapshot, I would use the Table Append api to just re-add the files that were removed. |
Thanks @RussellSpitzer for sharing! re-adding files will update the manifest and data will be query-able ? Can you please share your solution or APIs ? |
I think Russell is referring to https://iceberg.apache.org/docs/nightly/api/#update-operations |
Thanks, but I am still not clear - how can we identify the datafiles from the old snapshot for specific rows/partition and add them into latest snapshot ? |
You have a lot of options, you can read the files while time traveling, you can check metadata tables, you can read manifests directly. Say I want to revert files removed in snapshot A. I'd scan entries metadata table for all datafiles that were removed in that snapshot. Collect all the Datafile info. Then do something like (psuedo-code) table.newAppend().appendFile(file1).appendFile(file2).... |
Thanks @RussellSpitzer , let me try. Meantime if you already have any example please share. Also I would like to add datafiles for only specific partitions timestamp not all deleted datafiles needs to be reverted. |
Can we leverage cherrypick operations available : https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/CherryPickOperation.java ? |
I don’t think you can cherry pick a past snapshot, but you can try it. My
guess would be the validation would conflict
…On Fri, Feb 21, 2025 at 5:38 AM Shekhar Prasad Rajak < ***@***.***> wrote:
Can we leverage cherrypick operations available :
https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/CherryPickOperation.java
?
—
Reply to this email directly, view it on GitHub
<#12271 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADE2YNJBWTHHFK7BUQ6AGD2Q4F4TAVCNFSM6AAAAABXESA262VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNZUGMZDEOBWGE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
[image: Shekharrajak]*Shekharrajak* left a comment (apache/iceberg#12271)
<#12271 (comment)>
Can we leverage cherrypick operations available :
https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/CherryPickOperation.java
?
—
Reply to this email directly, view it on GitHub
<#12271 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADE2YNJBWTHHFK7BUQ6AGD2Q4F4TAVCNFSM6AAAAABXESA262VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNZUGMZDEOBWGE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Feature Request / Improvement
Hello team,
Is there any way to pick the specific partition or data rows from the old snapshots to main snapshot ?
Example:
When we delete a partition
x
from the main snapshot branch there will be a new commit & snapshot will be created. And new addition of the partitions will increament the snapshots but if we want to get the old partitionx
back, what APIs we have ? I could not find the way to mark deleted data files back in manifest and available in new snapshot.If we do not have such APIs, let's discuss the design for future version.
Query engine
None
Willingness to contribute
The text was updated successfully, but these errors were encountered: