-
Notifications
You must be signed in to change notification settings - Fork 327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set leave_on_terminate=true for servers and hardcode maxUnavailable=1 #3000
Conversation
90bb42b
to
e8b4e85
Compare
96ef1d1
to
015e526
Compare
aab25ee
to
01843e6
Compare
@lkysow since you're adding a new acceptance test package, in order to get it to run in the pipeline you need to add it to the yaml matrices: https://github.com/hashicorp/consul-k8s/tree/lkysow/server-restart/acceptance/ci-inputs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, just a couple nits (you can ignore if they don't make sense) and you need to add the server
acceptance package to the acceptance test matrics (see above).
Thanks for all the great comments too, it was very easy to see your reasoning behind things.
01843e6
to
3a09065
Compare
3a09065
to
f13d371
Compare
When leave_on_terminate=false (default), rolling the statefulset is disruptive because the new servers come up with the same node IDs but different IP addresses. They can't join the server cluster until the old server's node ID is marked as failed by serf. During this time, they continually start leader elections because they don't know there's a leader. When they eventually join the cluster, their election term is higher, and so they trigger a leadership swap. The leadership swap happens at the same time as the next node to be rolled is being stopped, and so the cluster can end up without a leader. With leave_on_terminate=true, the stopping server cleanly leaves the cluster, so the new server can join smoothly, even though it has the same node ID as the old server. This increases the speed of the rollout and in my testing eliminates the period without a leader. The downside of this change is that when a server leaves gracefully, it also reduces the number of raft peers. The number of peers is used to calculate the quorum size, so this can unexpectedly change the fault tolerance of the cluster. When running with an odd number of servers, 1 server leaving the cluster does not affect quorum size. E.g. 5 servers => quorum 3, 4 servers => quorum still 3. During a rollout, Kubernetes only stops 1 server at a time, so the quorum won't change. During a voluntary disruption event, e.g. a node being drained, Kubernetes uses the pod disruption budget to determine how many pods in a statefulset can be made unavailable at a time. That's why this change hardcodes this number to 1 now. Also set autopilot min_quorum to min quorum and disable autopilot upgrade migration since that's for blue/green deploys.
956680c
to
1722e1c
Compare
With hashicorp/consul-k8s#3000 merged, users can upgrade their k8s installs using a regular helm upgrade since the upgrade is now stable.
With hashicorp/consul-k8s#3000 merged, users can upgrade their k8s installs using a regular helm upgrade since the upgrade is now stable.
* docs: update k8s upgrade instructions With hashicorp/consul-k8s#3000 merged, users can upgrade their k8s installs using a regular helm upgrade since the upgrade is now stable. Co-authored-by: trujillo-adam <[email protected]>
… re-apply datadog-integration branch changes
* Datadog Integration (#3407) * datadog-integration: updated consul-server agent telemetry-config.json with dd specific items as well as additional missing VM based options, unit tests, dd unix socket integration, dd agent acl token generation, deployment override failsafes * datadog-integration: updated consul-server agent telemetry-config.json with dd specific items as well as additional missing VM based options, unit tests, dd unix socket integration, dd agent acl token generation | final initial-push * changelog entry update * datadog-integration: updated consul-server agent server.config (enable_debug) and telemetry.config update | enable_debug to server.config * curt pr review changes (minus extraConfig templating verification changes) * global.metrics.AgentMetrics -> global.metrics.enableAgentMetrics * dogstatsd and otlp mutually exclusive verification checks * breaking changes now incorporated into consul.validateExtraConfig helper template function as precheck * extraConfig hash updates post merge conflict update * fix helpers.tpl consul.extraConfig from merge --> /consul/tmp/extra-config/extra-from-values.json | add labels to rolebinding for datadog secrets * update changelog .txt to match new PR number * updated server-statefulset.yaml to correct ad.datadoghq.com/consul.logs annotation to valid single quote string * fix helpers.tpl consul.extraConfig from merge --> /consul/tmp/extra-config/extra-from-values.json | add labels to rolebinding for datadog secrets * fix helpers.tpl consul.extraConfig from merge --> /consul/tmp/extra-config/extra-from-values.json | add labels to rolebinding for datadog secrets * update UDP dogstatsdPort behavior to exclude including a port value if using a kube service address (as determined by user overrides) * update _helpers.tpl consul.ValidateDatadogConfiguration func to account for using 'https' as protocol => should fail * update server-statefulset.yaml to exclude prometheus.io annotations if enabling datadog openmetrics method for consul server metrics scrape. conflict present with http vs https that breaks openemtrics scrape on consul * update server-statefulset.yaml to exclude prometheus.io annotations if enabling datadog openmetrics method for consul server metrics scrape. conflict present with http vs https that breaks openemtrics scrape on consul * correct otlp protocol helpers.tpl check to lower-case the protocol to match the open-telemetry-deployment.yaml behavior * fix server-acl-init command_test.go for datadog token policy - datacenter should have been dc1 * add in server-statefulset bats test for extraConfig validation testing * manual cherry-pick failed checks fix * revert leave_on_terminate and autopilot updates from commit #3000 and re-apply datadog-integration branch changes
When leave_on_terminate=false (default), rolling the statefulset is
disruptive because the new servers come up with the same node IDs but
different IP addresses. They can't join the server cluster until the old
server's node ID is marked as failed by serf. During this time, they continually
start leader elections because they don't know there's a leader. When
they eventually join the cluster, their election term is higher, and so
they trigger a leadership swap. The leadership swap happens at the same
time as the next node to be rolled is being stopped, and so the cluster
can end up without a leader.
With leave_on_terminate=true, the stopping server cleanly leaves the
cluster, so the new server can join smoothly, even though it has the
same node ID as the old server. This increases the speed of the rollout
and in my testing eliminates the period without a leader.
The downside of this change is that when a server leaves gracefully, it
also reduces the number of raft peers. The number of peers is used to
calculate the quorum size, so this can unexpectedly change the fault
tolerance of the cluster. When running with an odd number of servers, 1
server leaving the cluster does not affect quorum size. E.g. 5 servers
=> quorum 3, 4 servers => quorum still 3. During a rollout, Kubernetes
only stops 1 server at a time, so the quorum won't change. During a
voluntary disruption event, e.g. a node being drained, Kubernetes uses
the pod disruption budget to determine how many pods in a statefulset
can be made unavailable at a time. That's why this change hardcodes this
number to 1 now.
Also set autopilot min_quorum to min quorum and disable autopilot
upgrade migration since that's for blue/green deploys.
How I've tested: