Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

design: multi-replica scheduling for singleton sources (aka hot standby) #31205

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

aljoscha
Copy link
Contributor

Rendered: https://github.com/aljoscha/materialize/blob/design-multi-replica-singleton-sources/doc/developer/design/20250127_multi_replica_scheduling_singleton_sources.md

Motivation

Tips for reviewer

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@aljoscha aljoscha force-pushed the design-multi-replica-singleton-sources branch from 06a025f to 6a126ef Compare January 27, 2025 17:50
@aljoscha aljoscha changed the title design: multi-replica scheduling for stateless sources (aka hot standby) design: multi-replica scheduling for singleton sources (aka hot standby) Jan 27, 2025
Copy link
Contributor

@benesch benesch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All makes sense to me!


## Non-Goals

- Add a failure detection mechanism for replicas and update source scheduling
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏽👍🏽👍🏽

I'm very in favor of explicitly excluding "fault tolerance" to keep the scope as limited as possible.

Copy link
Contributor

@teskje teskje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable to me. Stepping away from the assumption that all replicas receive the same commands will add a bunch of complexity in the controller, but it doesn't look like we have a choice. At least I don't think the alternative will turn out any simpler.

Comment on lines 5 to 7
We want to support zero-downtime ALTER on clusters. The plan for this is to
turn on a new replica with the changed parameters and turn off the old replica
when the new one is "sufficiently ready". This in turn requires that we are
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This might be me being dumb but I at first thought this was describing some form of ALTER SOURCE. It's describing what we usually call "graceful cluster reconfiguration", so maybe it would be helpful to mention that term here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding a clarification!

expected to shut down.

A: This is caught by the mechanisms we already have today for making sure there
is only one active ingestion dataflow. We need this for correctness in the fact
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
is only one active ingestion dataflow. We need this for correctness in the fact
is only one active ingestion dataflow. We need this for correctness in the face

@aljoscha aljoscha force-pushed the design-multi-replica-singleton-sources branch from 20f48f8 to 7a666be Compare January 28, 2025 13:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants