Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core,Api: Add overwrite option when register external table to catalog #12228

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

dramaticlly
Copy link
Contributor

@dramaticlly dramaticlly commented Feb 11, 2025

This PR adds a new register-table with overwrite option on Catalog interface to allow overwrite table metadata of an existing Iceberg table. The overwrite is achieved via TableOperations.commit(base, new) for catalogs extends BaseMetastoreCatalog.

@dramaticlly
Copy link
Contributor Author

Java CI Failure is timing out on concurrent fast append and seems unrelated to the change.

@rdblue @RussellSpitzer @danielcweeks do you want to take a look?

Copy link
Collaborator

@gaborkaszab gaborkaszab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this, @dramaticlly !

… catalog

Update REST RegisterTableRequest model and parser to support overwrite
Signed-off-by: Hongyue Zhang <[email protected]>
@@ -2871,6 +2872,33 @@ public void testRegisterExistingTable() {
assertThat(catalog.dropTable(identifier)).isTrue();
}

@Test
public void testRegisterAndOverwriteExistingTable() {
C catalog = catalog();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we just adding a bucket to test the change? Why not just use the table UUID? I feel like we should be able to just

Make Table 1
Make Table 2
Register overwrite Table1 with Table2
Check that metadata table1 matches table 2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially I think register with overwrite helps revert an existing table to a new previous health state. If we want to support overwrite with another tables's metadata, It seems better suited with drop + register, to reflect the table UUID change.

From the table spec, it asks Implementations to throw an exception if a table's UUID does not match the expected UUID when refreshing metadata. What do you think?

.isInstanceOf(AlreadyExistsException.class)
.hasMessage("Table already exists: hivedb.tbl");

catalog.registerTable(TABLE_IDENTIFIER, metadataFilePath, true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be a second test I think?

@@ -195,10 +196,18 @@ public void testRegisterExistingTable() {
ecsCatalog.createTable(identifier, SCHEMA);
Table registeringTable = ecsCatalog.loadTable(identifier);
TableOperations ops = ((HasTableOperations) registeringTable).operations();
String metadataLocation = ((EcsTableOperations) ops).currentMetadataLocation();
assertThatThrownBy(() -> ecsCatalog.registerTable(identifier, metadataLocation))
String unpartitionedMetadataLocation = ops.current().metadataFileLocation();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second test?

ops.commit(null, metadata);

TableMetadata currentMetadata = tableExists(identifier) ? ops.current() : null;
ops.commit(currentMetadata, TableMetadataParser.read(ops.io(), metadataFile));
Copy link
Member

@RussellSpitzer RussellSpitzer Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little worried about passing through current metadata here. Is this just a workaround to the normal commit logic?

If the metadata changes from "current" by the time an overwrite request goes through then we don't want a retry or failure, it should still pass?

Copy link
Contributor Author

@dramaticlly dramaticlly Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping to reuse the existing commit logic for atomicity support, and also better lineage to track previous/old metadata for hive table and JDBC tables.

As for potential conflict when base is out of date, I think that's a valid concern and we probably do not want this operation fail as user intent is replace with provided table metadata. I am thinking about add a retry block to help, please let me know if you feel otherwise

AtomicBoolean isRetry = new AtomicBoolean(false);
// commit with retry
Tasks.foreach(ops)
  .retry(COMMIT_NUM_RETRIES_DEFAULT)
  .exponentialBackoff(
      COMMIT_MIN_RETRY_WAIT_MS_DEFAULT,
      COMMIT_MAX_RETRY_WAIT_MS_DEFAULT,
      COMMIT_TOTAL_RETRY_TIME_MS_DEFAULT,
      2.0 /* exponential */)
  .onlyRetryOn(CommitFailedException.class)
  .run(
      taskOps -> {
        TableMetadata base = isRetry.get() ? taskOps.refresh() : taskOps.current();
        isRetry.set(true);
        taskOps.commit(base, newMetadata);
      });

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants