Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: SDK 2.59.0 Java SpannerIO ChangeStream Can't Use the same metadata database for multiple pipelines #32581

Open
1 of 17 tasks
bangau1 opened this issue Sep 27, 2024 · 1 comment

Comments

@bangau1
Copy link

bangau1 commented Sep 27, 2024

What happened?

With the latest apache beam sdk 2.59.0, the SpannerIO changestream's metadata table is adding the indexes: https://github.com/apache/beam/blame/160dffd88e5a60077a48e3e2f8fff331aecced08/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/changestreams/dao/PartitionMetadataAdminDao.java#L82-L86

Since the indexes' name are hardcoded to WatermarkIndex and CreatedAtStartTimestampIndex, this poses problem when there are multiple pipelines that are using the same metadata database but using different metadata tablename, since the index name must be unique within the same database.

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner
@bangau1 bangau1 changed the title [Bug]: Latest Java SpannerIO ChangeStream Can't Use the same metadata database for multiple pipelines [Bug]: SDK 2.59.0 Java SpannerIO ChangeStream Can't Use the same metadata database for multiple pipelines Sep 27, 2024
@bangau1
Copy link
Author

bangau1 commented Feb 21, 2025

After upgraded to dataflow SDK 2.62.0, the behavior is now changed. It assigns UUID to the index. However it's still using the database name as base name. #32689

Question is: why don't we use the metadata table name as the base name instead? So we can let user choose their own metadata table name as the unit of job. @thiagotnunes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant