-
Notifications
You must be signed in to change notification settings - Fork 465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
design: multi-replica scheduling for singleton sources (aka hot standby) #31205
base: main
Are you sure you want to change the base?
design: multi-replica scheduling for singleton sources (aka hot standby) #31205
Conversation
06a025f
to
6a126ef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All makes sense to me!
|
||
## Non-Goals | ||
|
||
- Add a failure detection mechanism for replicas and update source scheduling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏽👍🏽👍🏽
I'm very in favor of explicitly excluding "fault tolerance" to keep the scope as limited as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems reasonable to me. Stepping away from the assumption that all replicas receive the same commands will add a bunch of complexity in the controller, but it doesn't look like we have a choice. At least I don't think the alternative will turn out any simpler.
We want to support zero-downtime ALTER on clusters. The plan for this is to | ||
turn on a new replica with the changed parameters and turn off the old replica | ||
when the new one is "sufficiently ready". This in turn requires that we are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: This might be me being dumb but I at first thought this was describing some form of ALTER SOURCE
. It's describing what we usually call "graceful cluster reconfiguration", so maybe it would be helpful to mention that term here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding a clarification!
expected to shut down. | ||
|
||
A: This is caught by the mechanisms we already have today for making sure there | ||
is only one active ingestion dataflow. We need this for correctness in the fact |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is only one active ingestion dataflow. We need this for correctness in the fact | |
is only one active ingestion dataflow. We need this for correctness in the face |
20f48f8
to
7a666be
Compare
Rendered: https://github.com/aljoscha/materialize/blob/design-multi-replica-singleton-sources/doc/developer/design/20250127_multi_replica_scheduling_singleton_sources.md
Motivation
Tips for reviewer
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.