Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exit raft removed checker if raft isn't initialized #29329

Merged
merged 2 commits into from
Jan 10, 2025
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions physical/raft/raft.go
Original file line number Diff line number Diff line change
Expand Up @@ -1461,6 +1461,9 @@ func (b *RaftBackend) StartRemovedChecker(ctx context.Context) {
for {
select {
case <-ticker.C:
if !b.Initialized() {
Copy link
Contributor

@bosouza bosouza Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry right after approving it occurred to me that I hadn't considered how this uninitialized condition should interact with this loop, please check if my understanding is correct: this new condition !b.Initialized() won't ever be evaluated before the raft backend is initialized, so it only returns true after RaftBackend.TeardownCluster(), which gets called for example after force-restoring a snapshot. At that point the only thing that could "reinitialize" the raft backend is another call to RaftBackend.SetupCluster() but that would also start a new StartRemovedChecker so we can confidently rely on this !b.Initialized() to stop the removed checker. If that's right then my one suggestion would be to add a comment explaining that this check is not supposed to prevent the removed checker from running before the raft backend is initialized, but instead to allow it to exit cleanly after teardown of RaftBackend.

That also raises the question of what is the point of case <-ctx.Done(): if not to exit on teardown, but tracing the context all the way back it seems to just be the background context so there doesn't seem be a teardown mechanism relying on that indeed.

But I do get the feeling that I'm missing something and maybe a single instance of RaftBackend is supposed to last through multiple seal/unseal cycles, in which case the removed checker would either need a way to be restarted after unseal or remain working throughout the sealed period. I probably have a few incorrect assumptions in my reasoning, if you think it's easier to chat about it lmk!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call out! I've added a comment that should hopefully provide some clarity. The raft backend will always be set up again in SetupCluster, which will make a new removed checker. The initialized check here is supposed to handle the case where the cluster has been torn down, but the context isn't closed (which, as you mention, is pretty much every case since we're using context.Background())

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good to know, thanks for the additional details!

return
}
removed, err := b.IsNodeRemoved(ctx, b.localID)
if err != nil {
logger.Error("failed to check if node is removed", "node ID", b.localID, "error", err)
Expand Down
Loading