Exit raft removed checker if raft isn't initialized #29329

miagilepner · 2025-01-09T16:53:09Z

Description

Exits the checker. This prevents log spamming.

TODO only if you're a HashiCorp employee

Backport Labels: If this fix needs to be backported, use the appropriate backport/ label that matches the desired release branch. Note that in the CE repo, the latest release branch will look like backport/x.x.x, but older release branches will be backport/ent/x.x.x+ent.
- LTS: If this fixes a critical security vulnerability or severity 1 bug, it will also need to be backported to the current LTS versions of Vault. To ensure this, use all available enterprise labels.
ENT Breakage: If this PR either 1) removes a public function OR 2) changes the signature
of a public function, even if that change is in a CE file, double check that
applying the patch for this PR to the ENT repo and running tests doesn't
break any tests. Sometimes ENT only tests rely on public functions in CE
files.
Jira: If this change has an associated Jira, it's referenced either
in the PR description, commit message, or branch name.
RFC: If this change has an associated RFC, please link it in the description.
ENT PR: If this change has an associated ENT PR, please link it in the
description. Also, make sure the changelog is in this PR, not in your ENT PR.

github-actions · 2025-01-09T17:08:02Z

CI Results:
All Go tests succeeded! ✅

github-actions · 2025-01-09T17:11:06Z

Build Results:
All builds succeeded! ✅

bosouza · 2025-01-09T21:11:45Z

physical/raft/raft.go

@@ -1461,6 +1461,9 @@ func (b *RaftBackend) StartRemovedChecker(ctx context.Context) {
 		for {
 			select {
 			case <-ticker.C:
+				if !b.Initialized() {


sorry right after approving it occurred to me that I hadn't considered how this uninitialized condition should interact with this loop, please check if my understanding is correct: this new condition !b.Initialized() won't ever be evaluated before the raft backend is initialized, so it only returns true after RaftBackend.TeardownCluster(), which gets called for example after force-restoring a snapshot. At that point the only thing that could "reinitialize" the raft backend is another call to RaftBackend.SetupCluster() but that would also start a new StartRemovedChecker so we can confidently rely on this !b.Initialized() to stop the removed checker. If that's right then my one suggestion would be to add a comment explaining that this check is not supposed to prevent the removed checker from running before the raft backend is initialized, but instead to allow it to exit cleanly after teardown of RaftBackend.

That also raises the question of what is the point of case <-ctx.Done(): if not to exit on teardown, but tracing the context all the way back it seems to just be the background context so there doesn't seem be a teardown mechanism relying on that indeed.

But I do get the feeling that I'm missing something and maybe a single instance of RaftBackend is supposed to last through multiple seal/unseal cycles, in which case the removed checker would either need a way to be restarted after unseal or remain working throughout the sealed period. I probably have a few incorrect assumptions in my reasoning, if you think it's easier to chat about it lmk!

Good call out! I've added a comment that should hopefully provide some clarity. The raft backend will always be set up again in SetupCluster, which will make a new removed checker. The initialized check here is supposed to handle the case where the cluster has been torn down, but the context isn't closed (which, as you mention, is pretty much every case since we're using context.Background())

good to know, thanks for the additional details!

miagilepner · 2025-01-10T14:44:12Z

vault/external_tests/raftha/raft_ha_test.go

@@ -364,6 +364,9 @@ func TestRaftHACluster_Removed_ReAdd(t *testing.T) {
 			if !server.Healthy {
 				return fmt.Errorf("server %s is unhealthy", serverID)
 			}
+			if server.NodeType != "voter" {


this isn't related to the PR, but I wanted to fix the race test flake. I ran locally 5 times and didn't see it fail, when previously it would fail 50% of the time locally

bosouza · 2025-01-10T17:16:35Z

physical/raft/raft.go

@@ -1461,6 +1461,9 @@ func (b *RaftBackend) StartRemovedChecker(ctx context.Context) {
 		for {
 			select {
 			case <-ticker.C:
+				if !b.Initialized() {


good to know, thanks for the additional details!

check if not initialized

fd9ed45

miagilepner added the pr/no-changelog label Jan 9, 2025

miagilepner added this to the 1.19.0-rc milestone Jan 9, 2025

miagilepner requested a review from bosouza January 9, 2025 16:53

miagilepner requested a review from a team as a code owner January 9, 2025 16:53

github-actions bot added the hashicorp-contributed-pr If the PR is HashiCorp (i.e. not-community) contributed label Jan 9, 2025

bosouza previously approved these changes Jan 9, 2025

View reviewed changes

bosouza self-requested a review January 9, 2025 19:15

bosouza reviewed Jan 9, 2025

View reviewed changes

add comment and fix flake

1dc7ba9

miagilepner dismissed bosouza’s stale review via 1dc7ba9 January 10, 2025 14:35

miagilepner commented Jan 10, 2025

View reviewed changes

miagilepner requested a review from bosouza January 10, 2025 15:05

miagilepner enabled auto-merge (squash) January 10, 2025 15:34

bosouza approved these changes Jan 10, 2025

View reviewed changes

miagilepner merged commit dc0cd5a into main Jan 10, 2025
92 checks passed

miagilepner deleted the miagilepner/removed-checker-exit branch January 10, 2025 17:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exit raft removed checker if raft isn't initialized #29329

Exit raft removed checker if raft isn't initialized #29329

miagilepner commented Jan 9, 2025

github-actions bot commented Jan 9, 2025 •

edited

Loading

github-actions bot commented Jan 9, 2025

bosouza Jan 9, 2025 •

edited

Loading

miagilepner Jan 10, 2025

bosouza Jan 10, 2025

miagilepner Jan 10, 2025

bosouza Jan 10, 2025

Exit raft removed checker if raft isn't initialized #29329

Exit raft removed checker if raft isn't initialized #29329

Conversation

miagilepner commented Jan 9, 2025

Description

TODO only if you're a HashiCorp employee

github-actions bot commented Jan 9, 2025 • edited Loading

github-actions bot commented Jan 9, 2025

bosouza Jan 9, 2025 • edited Loading

Choose a reason for hiding this comment

miagilepner Jan 10, 2025

Choose a reason for hiding this comment

bosouza Jan 10, 2025

Choose a reason for hiding this comment

miagilepner Jan 10, 2025

Choose a reason for hiding this comment

bosouza Jan 10, 2025

Choose a reason for hiding this comment

github-actions bot commented Jan 9, 2025 •

edited

Loading

bosouza Jan 9, 2025 •

edited

Loading