-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core,Api: Add overwrite option when register external table to catalog #12228
base: main
Are you sure you want to change the base?
Conversation
Java CI Failure is timing out on concurrent fast append and seems unrelated to the change. @rdblue @RussellSpitzer @danielcweeks do you want to take a look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this, @dramaticlly !
aws/src/integration/java/org/apache/iceberg/aws/dynamodb/TestDynamoDbCatalog.java
Outdated
Show resolved
Hide resolved
aws/src/integration/java/org/apache/iceberg/aws/dynamodb/TestDynamoDbCatalog.java
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/BaseMetastoreCatalog.java
Outdated
Show resolved
Hide resolved
aws/src/integration/java/org/apache/iceberg/aws/dynamodb/TestDynamoDbCatalog.java
Outdated
Show resolved
Hide resolved
core/src/test/java/org/apache/iceberg/catalog/CatalogTests.java
Outdated
Show resolved
Hide resolved
… catalog Update REST RegisterTableRequest model and parser to support overwrite
9f48d2e
to
259ed96
Compare
Signed-off-by: Hongyue Zhang <[email protected]>
@@ -2871,6 +2872,33 @@ public void testRegisterExistingTable() { | |||
assertThat(catalog.dropTable(identifier)).isTrue(); | |||
} | |||
|
|||
@Test | |||
public void testRegisterAndOverwriteExistingTable() { | |||
C catalog = catalog(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we just adding a bucket to test the change? Why not just use the table UUID? I feel like we should be able to just
Make Table 1
Make Table 2
Register overwrite Table1 with Table2
Check that metadata table1 matches table 2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially I think register with overwrite helps revert an existing table to a new previous health state. If we want to support overwrite with another tables's metadata, It seems better suited with drop + register, to reflect the table UUID change.
From the table spec, it asks Implementations to throw an exception if a table's UUID does not match the expected UUID when refreshing metadata. What do you think?
.isInstanceOf(AlreadyExistsException.class) | ||
.hasMessage("Table already exists: hivedb.tbl"); | ||
|
||
catalog.registerTable(TABLE_IDENTIFIER, metadataFilePath, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be a second test I think?
@@ -195,10 +196,18 @@ public void testRegisterExistingTable() { | |||
ecsCatalog.createTable(identifier, SCHEMA); | |||
Table registeringTable = ecsCatalog.loadTable(identifier); | |||
TableOperations ops = ((HasTableOperations) registeringTable).operations(); | |||
String metadataLocation = ((EcsTableOperations) ops).currentMetadataLocation(); | |||
assertThatThrownBy(() -> ecsCatalog.registerTable(identifier, metadataLocation)) | |||
String unpartitionedMetadataLocation = ops.current().metadataFileLocation(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second test?
ops.commit(null, metadata); | ||
|
||
TableMetadata currentMetadata = tableExists(identifier) ? ops.current() : null; | ||
ops.commit(currentMetadata, TableMetadataParser.read(ops.io(), metadataFile)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little worried about passing through current metadata here. Is this just a workaround to the normal commit logic?
If the metadata changes from "current" by the time an overwrite request goes through then we don't want a retry or failure, it should still pass?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hoping to reuse the existing commit logic for atomicity support, and also better lineage to track previous/old metadata for hive table and JDBC tables.
As for potential conflict when base is out of date, I think that's a valid concern and we probably do not want this operation fail as user intent is replace with provided table metadata. I am thinking about add a retry block to help, please let me know if you feel otherwise
AtomicBoolean isRetry = new AtomicBoolean(false);
// commit with retry
Tasks.foreach(ops)
.retry(COMMIT_NUM_RETRIES_DEFAULT)
.exponentialBackoff(
COMMIT_MIN_RETRY_WAIT_MS_DEFAULT,
COMMIT_MAX_RETRY_WAIT_MS_DEFAULT,
COMMIT_TOTAL_RETRY_TIME_MS_DEFAULT,
2.0 /* exponential */)
.onlyRetryOn(CommitFailedException.class)
.run(
taskOps -> {
TableMetadata base = isRetry.get() ? taskOps.refresh() : taskOps.current();
isRetry.set(true);
taskOps.commit(base, newMetadata);
});
This PR adds a new register-table with overwrite option on Catalog interface to allow overwrite table metadata of an existing Iceberg table. The overwrite is achieved via
TableOperations.commit(base, new)
for catalogs extends BaseMetastoreCatalog.