-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VDiff: fix race when a vdiff resumes on vttablet restart #17638
base: main
Are you sure you want to change the base?
Conversation
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #17638 +/- ##
==========================================
- Coverage 67.76% 67.73% -0.03%
==========================================
Files 1586 1586
Lines 255763 255758 -5
==========================================
- Hits 173315 173246 -69
- Misses 82448 82512 +64 ☔ View full report in Codecov by Sentry. |
// The watch failed. Stop it so we start a new one if needed. | ||
log.Errorf("Shard watch failed: %v", event.Err) | ||
shardWatch.stop() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this because we also saw a panic here when the vdiff panic occurred. Possibly a different root cause (race?) is causing that since we should never get a nil event, but adding a guard here so vttablet doesn't panic in such a case but logs it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @rohit-nayak-ps !
@rohit-nayak-ps the stats were only added in v20: fa59f9d |
@@ -221,6 +221,11 @@ func (vde *Engine) addController(row sqltypes.RowNamedValues, options *tabletman | |||
globalStats.mu.Lock() | |||
defer globalStats.mu.Unlock() | |||
globalStats.controllers[ct.id] = ct | |||
|
|||
// run() can start a vdiff in pending/started state, so we should init the stats first before starting it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is no longer applicable. And we could revert the related changes too. Fine either way IMO.
@@ -153,12 +153,12 @@ func (vde *Engine) openLocked(ctx context.Context) error { | |||
if err != nil { | |||
return err | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The vdiff engine's controller list is initialized as empty in Open and I think that we should do the same for the stats so that they stay in sync and the stats don't grow and grow with old cruft. We can do that by placing this block just after the if len(vde.controllers) > 0 {
block above:
func() {
globalStats.mu.Lock()
defer globalStats.mu.Unlock()
globalStats.controllers = make(map[int64]*controller)
}()
Signed-off-by: Rohit Nayak <[email protected]>
Signed-off-by: Rohit Nayak <[email protected]>
Signed-off-by: Rohit Nayak <[email protected]>
Signed-off-by: Rohit Nayak <[email protected]>
22cf1e0
to
853bb6f
Compare
Description
There is a race when vttablet starts and the vdiff engine opens. It starts vdiffs which need to be resumed where we restart vdiffs in error state concurrently with the vdiff engine init. It is possible that
globalStats
map is not initialized when a new controller gets added and the controller adds it self to the map.This PR starts the goroutine that runs the controller thread only after the inits have been done.
This was added way back, so we backport to all supported versions: https://github.com/vitessio/vitess/pull/10639/files#diff-931d120fc70d7007ffd77176f54c526f678c1a6c4e703dbf54b3086c967dbd3dR323
Related Issue(s)
Checklist
Deployment Notes