Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: Add warning about snapshot_ids arg in expired_snapshots procedure #12291

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions docs/docs/spark-procedures.md
Original file line number Diff line number Diff line change
Expand Up @@ -255,7 +255,7 @@ Many [maintenance actions](maintenance.md) can be performed using Iceberg stored
### `expire_snapshots`

Each write/update/delete/upsert/compaction in Iceberg produces a new snapshot while keeping the old data and metadata
around for snapshot isolation and time travel. The `expire_snapshots` procedure can be used to remove older snapshots
around for snapshot isolation and time travel. The [`expire_snapshots`](https://github.com/apache/iceberg/blob/0b47faaada2aafa42c118be78445f6c40fc0ead6/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/procedures/ExpireSnapshotsProcedure.java) procedure can be used to remove older snapshots
and their files which are no longer needed.

This procedure will remove old snapshots and data files which are uniquely required by those old snapshots. This means
Expand All @@ -270,7 +270,8 @@ the `expire_snapshots` procedure will never remove files which are still require
| `retain_last` | | int | Number of ancestor snapshots to preserve regardless of `older_than` (defaults to 1) |
| `max_concurrent_deletes` | | int | Size of the thread pool used for delete file actions (by default, no thread pool is used) |
| `stream_results` | | boolean | When true, deletion files will be sent to Spark driver by RDD partition (by default, all the files will be sent to Spark driver). This option is recommended to set to `true` to prevent Spark driver OOM from large file size |
| `snapshot_ids` | | array of long | Array of snapshot IDs to expire. |
| `snapshot_ids` | | array of long | Array of [additional snapshot IDs to remove](https://github.com/apache/iceberg/blob/0b47faaada2aafa42c118be78445f6c40fc0ead6/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/ExpireSnapshotsSparkAction.java#L149-L151), along with those expired by `older_than` and `retain_last` arguments, and the table's [expiration properties](configuration.md#table-behavior-properties)). |
) |

If `older_than` and `retain_last` are omitted, the table's [expiration properties](configuration.md#table-behavior-properties) will be used.
Snapshots that are still referenced by branches or tags won't be removed. By default, branches and tags never expire, but their retention policy can be changed with the table property `history.expire.max-ref-age-ms`. The `main` branch never expires.
Expand All @@ -294,7 +295,7 @@ Remove snapshots older than specific day and time, but retain the last 100 snaps
CALL hive_prod.system.expire_snapshots('db.sample', TIMESTAMP '2021-06-30 00:00:00.000', 100);
```

Remove snapshots with snapshot ID `123` (note that this snapshot ID should not be the current snapshot):
Remove expired snapshots, as well as snapshots with snapshot ID `123` (note that this snapshot ID should not be the current snapshot):

```sql
CALL hive_prod.system.expire_snapshots(table => 'db.sample', snapshot_ids => ARRAY(123));
Expand Down