Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set leave_on_terminate=true for servers and hardcode maxUnavailable=1 #3000

Merged
merged 1 commit into from
Jan 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .changelog/2962.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
```releast-note:feature
```release-note:feature
lkysow marked this conversation as resolved.
Show resolved Hide resolved
api-gateway: (Consul Enterprise) Add JWT authentication and authorization for API Gateway and HTTPRoutes.
```
36 changes: 36 additions & 0 deletions .changelog/3000.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
```release-note:breaking-change
server: set `leave_on_terminate` to `true` and set the server pod disruption budget `maxUnavailable` to `1`.

This change makes server rollouts faster and more reliable. However, there is now a potential for reduced reliability if users accidentally
scale the statefulset down. Now servers will leave the raft pool when they are stopped gracefully which reduces the fault
tolerance. For example, with 5 servers, you can tolerate a loss of 2 servers' data as raft guarantees data is replicated to
a majority of nodes (3). However, if you accidentally scale the statefulset down to 3, then the raft quorum will now be 2, and
if you lose 2 servers, you may lose data. Before this change, the quorum would have remained at 3.

During a regular rollout, the number of servers will be reduced by 1 at a time, which doesn't affect quorum when running
an odd number of servers, e.g. quorum for 5 servers is 3, and quorum for 4 servers is also 3. That's why the pod disruption
budget is being set to 1 now.

If a server is stopped ungracefully, e.g. due to a node loss, it will not leave the raft pool, and so fault tolerance won't be affected.

For the vast majority of users, this change will be beneficial, however if you wish to remain with the old settings you
can set:

server:
extraConfig: |
{"leave_on_terminate": false}
disruptionBudget:
maxUnavailable: <previous setting>

lkysow marked this conversation as resolved.
Show resolved Hide resolved
```

```release-note:breaking-change
server: set `autopilot.min_quorum` to the correct quorum value to ensure autopilot doesn't prune servers needed for quorum. Also set `autopilot. disable_upgrade_migration` to `true` as that setting is meant for blue/green deploys, not rolling deploys.

This setting makes sense for most use-cases, however if you had a specific reason to use the old settings you can use the following config to keep them:

server:
extraConfig: |
{"autopilot": {"min_quorum": 0, "disable_upgrade_migration": false}}

```
2 changes: 1 addition & 1 deletion acceptance/ci-inputs/aks_acceptance_test_packages.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@
# Cloud package is not included in test suite as it is triggered from a non consul-k8s repo and requires HCP credentials
- {runner: 0, test-packages: "connect peering snapshot-agent wan-federation"}
- {runner: 1, test-packages: "consul-dns example partitions metrics sync"}
- {runner: 2, test-packages: "basic cli config-entries api-gateway ingress-gateway terminating-gateway vault"}
- {runner: 2, test-packages: "basic cli config-entries api-gateway ingress-gateway terminating-gateway vault server"}
2 changes: 1 addition & 1 deletion acceptance/ci-inputs/eks_acceptance_test_packages.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@
# Cloud package is not included in test suite as it is triggered from a non consul-k8s repo and requires HCP credentials
- {runner: 0, test-packages: "connect peering snapshot-agent wan-federation"}
- {runner: 1, test-packages: "consul-dns example partitions metrics sync"}
- {runner: 2, test-packages: "basic cli config-entries api-gateway ingress-gateway terminating-gateway vault"}
- {runner: 2, test-packages: "basic cli config-entries api-gateway ingress-gateway terminating-gateway vault server"}
2 changes: 1 addition & 1 deletion acceptance/ci-inputs/gke_acceptance_test_packages.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@
# Cloud package is not included in test suite as it is triggered from a non consul-k8s repo and requires HCP credentials
- {runner: 0, test-packages: "connect peering snapshot-agent wan-federation"}
- {runner: 1, test-packages: "consul-dns example partitions metrics sync"}
- {runner: 2, test-packages: "basic cli config-entries api-gateway ingress-gateway terminating-gateway vault"}
- {runner: 2, test-packages: "basic cli config-entries api-gateway ingress-gateway terminating-gateway vault server"}
2 changes: 1 addition & 1 deletion acceptance/ci-inputs/kind_acceptance_test_packages.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
- {runner: 1, test-packages: "peering"}
- {runner: 2, test-packages: "sameness"}
- {runner: 3, test-packages: "connect snapshot-agent wan-federation"}
- {runner: 4, test-packages: "cli vault metrics"}
- {runner: 4, test-packages: "cli vault metrics server"}
- {runner: 5, test-packages: "api-gateway ingress-gateway sync example consul-dns"}
- {runner: 6, test-packages: "config-entries terminating-gateway basic"}
- {runner: 7, test-packages: "mesh_v2 tenancy_v2"}
1 change: 1 addition & 0 deletions acceptance/tests/example/main_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
// SPDX-License-Identifier: MPL-2.0

// Rename package to your test package.
// NOTE: Remember to add your test package to acceptance/ci-inputs so it gets run in CI.
package example

import (
Expand Down
18 changes: 18 additions & 0 deletions acceptance/tests/server/main_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
// Copyright (c) HashiCorp, Inc.
// SPDX-License-Identifier: MPL-2.0

package server

import (
"os"
"testing"

testsuite "github.com/hashicorp/consul-k8s/acceptance/framework/suite"
)

var suite testsuite.Suite

func TestMain(m *testing.M) {
suite = testsuite.NewSuite(m)
os.Exit(suite.Run())
}
91 changes: 91 additions & 0 deletions acceptance/tests/server/server_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
// Copyright (c) HashiCorp, Inc.
// SPDX-License-Identifier: MPL-2.0

package server

import (
"encoding/json"
"fmt"
"testing"
"time"

"github.com/hashicorp/consul-k8s/acceptance/framework/consul"
"github.com/hashicorp/consul-k8s/acceptance/framework/helpers"
"github.com/hashicorp/consul-k8s/acceptance/framework/k8s"
"github.com/hashicorp/consul-k8s/acceptance/framework/logger"
"github.com/hashicorp/go-multierror"
"github.com/stretchr/testify/require"
)

// Test that when servers are restarted, they don't lose leadership.
func TestServerRestart(t *testing.T) {
cfg := suite.Config()
if cfg.EnableCNI || cfg.EnableTransparentProxy {
t.Skipf("skipping because -enable-cni or -enable-transparent-proxy is set and server restart " +
"is already tested without those settings and those settings don't affect this test")
}

ctx := suite.Environment().DefaultContext(t)
replicas := 3
releaseName := helpers.RandomName()
helmValues := map[string]string{
"global.enabled": "false",
"connectInject.enabled": "false",
"server.enabled": "true",
"server.replicas": fmt.Sprintf("%d", replicas),
"server.affinity": "null", // Allow >1 pods per node so we can test in minikube with one node.
}
consulCluster := consul.NewHelmCluster(t, helmValues, suite.Environment().DefaultContext(t), suite.Config(), releaseName)
consulCluster.Create(t)

// Start a separate goroutine to check if at any point more than one server is without
// a leader. We expect the server that is restarting to be without a leader because it hasn't
// yet joined the cluster but the other servers should have a leader.
expReadyPods := replicas - 1
var unmarshallErrs error
timesWithoutLeader := 0
done := make(chan bool)
defer close(done)
go func() {
for {
select {
case <-done:
return
default:
out, err := k8s.RunKubectlAndGetOutputE(t, ctx.KubectlOptions(t), "get", fmt.Sprintf("statefulset/%s-consul-server", releaseName),
"-o", "jsonpath={.status}")
if err != nil {
// Not failing the test on this error to reduce flakiness.
logger.Logf(t, "kubectl err: %s: %s", err, out)
break
}
type statefulsetOut struct {
ReadyReplicas *int `json:"readyReplicas,omitempty"`
}
var jsonOut statefulsetOut
if err = json.Unmarshal([]byte(out), &jsonOut); err != nil {
unmarshallErrs = multierror.Append(err)
} else if jsonOut.ReadyReplicas == nil || *jsonOut.ReadyReplicas < expReadyPods {
// note: for some k8s api reason when readyReplicas is 0 it's not included in the json output so
// that's why we're checking if it's nil.
timesWithoutLeader++
}
time.Sleep(1 * time.Second)
}
}
}()

// Restart servers
out, err := k8s.RunKubectlAndGetOutputE(t, ctx.KubectlOptions(t), "rollout", "restart", fmt.Sprintf("statefulset/%s-consul-server", releaseName))
require.NoError(t, err, out)

// Wait for restart to finish.
start := time.Now()
out, err = k8s.RunKubectlAndGetOutputE(t, ctx.KubectlOptions(t), "rollout", "status", "--timeout", "5m", "--watch", fmt.Sprintf("statefulset/%s-consul-server", releaseName))
require.NoError(t, err, out, "rollout status command errored, this likely means the rollout didn't complete in time")

// Check results
require.NoError(t, unmarshallErrs, "there were some json unmarshall errors, this is likely a bug")
logger.Logf(t, "restart took %s, there were %d instances where more than one server had no leader", time.Since(start), timesWithoutLeader)
require.Equal(t, 0, timesWithoutLeader, "there were %d instances where more than one server had no leader", timesWithoutLeader)
}
23 changes: 13 additions & 10 deletions charts/consul/templates/_helpers.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -166,24 +166,27 @@ Expand the name of the chart.
{{- end -}}

{{/*
Compute the maximum number of unavailable replicas for the PodDisruptionBudget.
This defaults to (n/2)-1 where n is the number of members of the server cluster.
Special case of replica equaling 3 and allowing a minor disruption of 1 otherwise
use the integer value
Add a special case for replicas=1, where it should default to 0 as well.
Calculate max number of server pods that are allowed to be voluntarily disrupted.
When there's 1 server, this is set to 0 because this pod should not be disrupted. This is an edge
case and I'm not sure it makes a difference when there's only one server but that's what the previous config was and
I don't want to change it for this edge case.
Otherwise we've changed this to always be 1 as part of the move to set leave_on_terminate
to true. With leave_on_terminate set to true, whenever a server pod is stopped, the number of peers in raft
is reduced. If the number of servers is odd and the count is reduced by 1, the quorum size doesn't change,
but if it's reduced by more than 1, the quorum size can change so that's why this is now always hardcoded to 1.
*/}}
{{- define "consul.pdb.maxUnavailable" -}}
{{- define "consul.server.pdb.maxUnavailable" -}}
{{- if eq (int .Values.server.replicas) 1 -}}
{{ 0 }}
{{- else if .Values.server.disruptionBudget.maxUnavailable -}}
{{ .Values.server.disruptionBudget.maxUnavailable -}}
{{- else -}}
{{- if eq (int .Values.server.replicas) 3 -}}
{{- 1 -}}
{{- else -}}
{{- sub (div (int .Values.server.replicas) 2) 1 -}}
{{ 1 }}
{{- end -}}
{{- end -}}

{{- define "consul.server.autopilotMinQuorum" -}}
{{- add (div (int .Values.server.replicas) 2) 1 -}}
{{- end -}}

{{- define "consul.pdb.connectInject.maxUnavailable" -}}
Expand Down
7 changes: 6 additions & 1 deletion charts/consul/templates/server-config-configmap.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,12 @@ data:
"enabled": true
},
{{- end }}
"server": true
"server": true,
"leave_on_terminate": true,
"autopilot": {
"min_quorum": {{ template "consul.server.autopilotMinQuorum" . }},
"disable_upgrade_migration": true
}
}
{{- $vaultConnectCAEnabled := and .Values.global.secretsBackend.vault.connectCA.address .Values.global.secretsBackend.vault.connectCA.rootPKIPath .Values.global.secretsBackend.vault.connectCA.intermediatePKIPath -}}
{{- if and .Values.global.secretsBackend.vault.enabled $vaultConnectCAEnabled }}
Expand Down
2 changes: 1 addition & 1 deletion charts/consul/templates/server-disruptionbudget.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ metadata:
release: {{ .Release.Name }}
component: server
spec:
maxUnavailable: {{ template "consul.pdb.maxUnavailable" . }}
maxUnavailable: {{ template "consul.server.pdb.maxUnavailable" . }}
selector:
matchLabels:
app: {{ template "consul.name" . }}
Expand Down
60 changes: 59 additions & 1 deletion charts/consul/test/unit/server-config-configmap.bats
Original file line number Diff line number Diff line change
Expand Up @@ -1307,4 +1307,62 @@ load _helpers
yq -r '.data["server.json"]' | jq -r .log_level | tee /dev/stderr)

[ "${configmap}" = "DEBUG" ]
}
}

#--------------------------------------------------------------------
# server.autopilot.min_quorum

@test "server/ConfigMap: autopilot.min_quorum=1 when replicas=1" {
cd `chart_dir`
local actual=$(helm template \
-s templates/server-config-configmap.yaml \
--set 'server.replicas=1' \
. | tee /dev/stderr |
yq -r '.data["server.json"]' | jq -r .autopilot.min_quorum | tee /dev/stderr)

[ "${actual}" = "1" ]
}

@test "server/ConfigMap: autopilot.min_quorum=2 when replicas=2" {
cd `chart_dir`
local actual=$(helm template \
-s templates/server-config-configmap.yaml \
--set 'server.replicas=2' \
. | tee /dev/stderr |
yq -r '.data["server.json"]' | jq -r .autopilot.min_quorum | tee /dev/stderr)

[ "${actual}" = "2" ]
}

@test "server/ConfigMap: autopilot.min_quorum=2 when replicas=3" {
cd `chart_dir`
local actual=$(helm template \
-s templates/server-config-configmap.yaml \
--set 'server.replicas=3' \
. | tee /dev/stderr |
yq -r '.data["server.json"]' | jq -r .autopilot.min_quorum | tee /dev/stderr)

[ "${actual}" = "2" ]
}

@test "server/ConfigMap: autopilot.min_quorum=3 when replicas=4" {
cd `chart_dir`
local actual=$(helm template \
-s templates/server-config-configmap.yaml \
--set 'server.replicas=4' \
. | tee /dev/stderr |
yq -r '.data["server.json"]' | jq -r .autopilot.min_quorum | tee /dev/stderr)

[ "${actual}" = "3" ]
}

@test "server/ConfigMap: autopilot.min_quorum=3 when replicas=5" {
cd `chart_dir`
local actual=$(helm template \
-s templates/server-config-configmap.yaml \
--set 'server.replicas=5' \
. | tee /dev/stderr |
yq -r '.data["server.json"]' | jq -r .autopilot.min_quorum | tee /dev/stderr)

[ "${actual}" = "3" ]
}
28 changes: 25 additions & 3 deletions charts/consul/test/unit/server-disruptionbudget.bats
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,16 @@ load _helpers
[ "${actual}" = "0" ]
}

@test "server/DisruptionBudget: correct maxUnavailable with replicas=2" {
cd `chart_dir`
local actual=$(helm template \
-s templates/server-disruptionbudget.yaml \
--set 'server.replicas=2' \
. | tee /dev/stderr |
yq '.spec.maxUnavailable' | tee /dev/stderr)
[ "${actual}" = "1" ]
}

@test "server/DisruptionBudget: correct maxUnavailable with replicas=3" {
cd `chart_dir`
local actual=$(helm template \
Expand Down Expand Up @@ -97,7 +107,7 @@ load _helpers
--set 'server.replicas=6' \
. | tee /dev/stderr |
yq '.spec.maxUnavailable' | tee /dev/stderr)
[ "${actual}" = "2" ]
[ "${actual}" = "1" ]
}

@test "server/DisruptionBudget: correct maxUnavailable with replicas=7" {
Expand All @@ -107,7 +117,7 @@ load _helpers
--set 'server.replicas=7' \
. | tee /dev/stderr |
yq '.spec.maxUnavailable' | tee /dev/stderr)
[ "${actual}" = "2" ]
[ "${actual}" = "1" ]
}

@test "server/DisruptionBudget: correct maxUnavailable with replicas=8" {
Expand All @@ -117,9 +127,21 @@ load _helpers
--set 'server.replicas=8' \
. | tee /dev/stderr |
yq '.spec.maxUnavailable' | tee /dev/stderr)
[ "${actual}" = "3" ]
[ "${actual}" = "1" ]
}

@test "server/DisruptionBudget: correct maxUnavailable when set with value" {
cd `chart_dir`
local actual=$(helm template \
-s templates/server-disruptionbudget.yaml \
--set 'server.replicas=5' \
--set 'server.disruptionBudget.maxUnavailable=5' \
. | tee /dev/stderr |
yq '.spec.maxUnavailable' | tee /dev/stderr)
[ "${actual}" = "5" ]
}


#--------------------------------------------------------------------
# apiVersion

Expand Down
Loading