Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: optimize Spanner changestream metadata table #32213

Merged
merged 4 commits into from
Aug 19, 2024

Conversation

thiagotnunes
Copy link
Contributor

@thiagotnunes thiagotnunes commented Aug 16, 2024

Adds indexes to the SpannerIO change streams metadata table. This serves to optimize two specific queries:

  1. getUnfinishedMinWatermark
  2. getAllPartitionsCreatedAfter

(micro-)Benchmarking results with a metadata table containing 11 million rows:

Method No Indexes (current) Index (this PR)
getUnfinishedMinWatermark 3.190s (+/- 0.91s) 0.020s (+/- 0.002s)
getAllPartitionsCreatedAfter 1.250s (+/- 0.230s) 0.012s (+/- 0.001s)

Customers can add these index manually if they want to by running the following DDL statements:

-- For GoogleSQL dialect
CREATE INDEX WatermarkIndex ON <Metadata Table Name> (Watermark) STORING (State);
CREATE INDEX CreatedAtStartTimestampIndex ON <Metadata Table Name> (CreatedAt, StartTimestamp);

-- For PostgreSQL dialect
CREATE INDEX "WatermarkIndex" ON <Metadata Table Name> ("Watermark") INCLUDE ("State");
CREATE INDEX "CreatedAtStartTimestampIndex" ON <Metadata Table Name> ("CreatedAt", "StartTimestamp");

@thiagotnunes thiagotnunes force-pushed the spanner-metadata-indexes branch from f30e44b to d7b4ba4 Compare August 16, 2024 05:54
Copy link
Contributor

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

@thiagotnunes
Copy link
Contributor Author

Run Java_GCP_IO_Direct PreCommit

@thiagotnunes
Copy link
Contributor Author

assign set of reviewers

Copy link
Contributor

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @robertwb for label java.
R: @Abacn for label io.
R: @nielm for label spanner.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

Copy link
Contributor

Reviewers are already assigned to this PR: @robertwb @Abacn @nielm

Copy link
Contributor

@Abacn Abacn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, From the benchmark it sounds a great optimization!

@Abacn Abacn merged commit 6b4a7a5 into apache:master Aug 19, 2024
18 checks passed
@thiagotnunes thiagotnunes deleted the spanner-metadata-indexes branch August 19, 2024 22:49
@velamkao
Copy link

Hi @thiagotnunes would you know if this change to add the index is going to be part of Beam 2.59.0 release?

@velamkao
Copy link

feat: index creation toga4/spream#106

Never mind, I can see its there https://github.com/apache/beam/blob/v2.59.0-RC1/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/changestreams/dao/PartitionMetadataAdminDao.java

reeba212 pushed a commit to reeba212/beam that referenced this pull request Dec 4, 2024
* feat: optimize Spanner changestream metadata table

* fix: linting

* tests: fixes admin dao tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants