-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: optimize Spanner changestream metadata table #32213
Conversation
f30e44b
to
d7b4ba4
Compare
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
Run Java_GCP_IO_Direct PreCommit |
assign set of reviewers |
Assigning reviewers. If you would like to opt out of this review, comment R: @robertwb for label java. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, From the benchmark it sounds a great optimization!
Hi @thiagotnunes would you know if this change to add the index is going to be part of Beam 2.59.0 release? |
Never mind, I can see its there https://github.com/apache/beam/blob/v2.59.0-RC1/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/changestreams/dao/PartitionMetadataAdminDao.java |
* feat: optimize Spanner changestream metadata table * fix: linting * tests: fixes admin dao tests
Adds indexes to the SpannerIO change streams metadata table. This serves to optimize two specific queries:
getUnfinishedMinWatermark
getAllPartitionsCreatedAfter
(micro-)Benchmarking results with a metadata table containing 11 million rows:
Customers can add these index manually if they want to by running the following DDL statements: