-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exit raft removed checker if raft isn't initialized #29329
Merged
Merged
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -364,6 +364,9 @@ func TestRaftHACluster_Removed_ReAdd(t *testing.T) { | |
if !server.Healthy { | ||
return fmt.Errorf("server %s is unhealthy", serverID) | ||
} | ||
if server.NodeType != "voter" { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this isn't related to the PR, but I wanted to fix the race test flake. I ran locally 5 times and didn't see it fail, when previously it would fail 50% of the time locally |
||
return fmt.Errorf("server %s has type %s", serverID, server.NodeType) | ||
} | ||
} | ||
return nil | ||
}) | ||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry right after approving it occurred to me that I hadn't considered how this uninitialized condition should interact with this loop, please check if my understanding is correct: this new condition
!b.Initialized()
won't ever be evaluated before the raft backend is initialized, so it only returnstrue
afterRaftBackend.TeardownCluster()
, which gets called for example after force-restoring a snapshot. At that point the only thing that could "reinitialize" the raft backend is another call toRaftBackend.SetupCluster()
but that would also start a newStartRemovedChecker
so we can confidently rely on this!b.Initialized()
to stop the removed checker. If that's right then my one suggestion would be to add a comment explaining that this check is not supposed to prevent the removed checker from running before the raft backend is initialized, but instead to allow it to exit cleanly after teardown ofRaftBackend
.That also raises the question of what is the point of
case <-ctx.Done():
if not to exit on teardown, but tracing the context all the way back it seems to just be the background context so there doesn't seem be a teardown mechanism relying on that indeed.But I do get the feeling that I'm missing something and maybe a single instance of
RaftBackend
is supposed to last through multiple seal/unseal cycles, in which case the removed checker would either need a way to be restarted after unseal or remain working throughout the sealed period. I probably have a few incorrect assumptions in my reasoning, if you think it's easier to chat about it lmk!There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call out! I've added a comment that should hopefully provide some clarity. The raft backend will always be set up again in SetupCluster, which will make a new removed checker. The initialized check here is supposed to handle the case where the cluster has been torn down, but the context isn't closed (which, as you mention, is pretty much every case since we're using context.Background())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good to know, thanks for the additional details!