Fixed processor dispose race condition #135

echistyakov · 2024-12-17T00:08:21Z

Fix race condition in RequestResponse (with processor Dispose())

Motivation:

I caught this bug while stress-testing RSocket-based Client/Server implementation in Facebook Thrift: https://github.com/facebook/fbthrift/blob/main/thrift/lib/go/thrift/stress/server_test.go

It's a pretty simple stress test - just 100K concurrent RequestResponse's to 1 server.
At most 100 concurrent RSocket connections at a time, a fresh connection is created for each request.

The symptom of the stress-test failure was that a small portion of requests would fail with use of closed network connection error.

2024/12/16 15:21:02 flush failed drain: flush failed: write tcp [::1]:43993->[::1]:40658: use of closed network connection
2024/12/16 15:21:02 flush failed drain: flush failed: write tcp [::1]:43993->[::1]:35820: use of closed network connection
2024/12/16 15:21:02 flush failed drain: flush failed: write tcp [::1]:43993->[::1]:36604: use of closed network connection
2024/12/16 15:21:02 flush failed drain: flush failed: write tcp [::1]:43993->[::1]:36780: use of closed network connection
2024/12/16 15:21:02 flush failed drain: flush failed: write tcp [::1]:43993->[::1]:35856: use of closed network connection

This is a race condition between two clients and it goes like this:

Client makes a RequestResponse type of request here:

rsocket-go/internal/socket/duplex.go

Line 247 in 473989b

func (dc *DuplexConnection) RequestResponse(req payload.Payload) (res mono.Mono) {
A mono processor is created by pulling it from a global processor pool:

rsocket-go/internal/socket/duplex.go

Line 266 in 473989b

m, s, _ := mono.NewProcessor(dc.reqSche, onFinally)
- (Corresponding global pool code in the reactor-go repo.)
A callback handler is registered, the processor is assigned to its sink field:

rsocket-go/internal/socket/duplex.go

Lines 267 to 269 in 473989b

handler.sink = s

dc.register(sid, handler)
The client gets a response back from the server (no issues).

The processor invokes the onFinally callback (since the Stream Sequence is now complete):

rsocket-go/internal/socket/duplex.go

Lines 257 to 264 in 473989b

    
           onFinally := func(s reactor.SignalType, d reactor.Disposable) { 
        
           	common.TryRelease(handler.cache) 
        
           	d.Dispose() 
        
           	if s == reactor.SignalTypeCancel { 
        
           		dc.sendFrame(framing.NewWriteableCancelFrame(sid)) 
        
           	} 
        
           	dc.unregister(sid) 
        
           }

The processor gets disposed on the following line (it is placed back into the global processor pool and becomes available for any other RSocket client to use):

rsocket-go/internal/socket/duplex.go

Line 259 in 473989b

d.Dispose()
Immediately after the above line - our current Go-routine gets pre-emptied. It does not get a chance to unregister the handler callback (which still holds a pointer to the sink we just released into the global pool):

rsocket-go/internal/socket/duplex.go

Line 263 in 473989b

dc.unregister(sid)
Another Go-routine starts running.
a. This Go-routine creates a completely separate RSocket client to make a separate RequestResponse.
b. This RSocket client happens to get the same sink/processor (that we just disposed earlier) from the global pool.

We call Close() on our original client from earlier steps (since the RequestResponse sequence had already been completed).
a. A destroyHandler method gets invoked:

rsocket-go/internal/socket/duplex.go

Lines 133 to 138 in 473989b

    
           err := dc.GetError() 
        
           if err == nil { 
        
           	dc.destroyHandler(errSocketClosed) 
        
           } else { 
        
           	dc.destroyHandler(err) 
        
           }

rsocket-go/internal/socket/duplex.go

Line 163 in 473989b

func (dc *DuplexConnection) destroyHandler(err error) {

b. It in turn invokes stopWithError method of our handler (which we did not yet unregister because we got pre-emptied in step 8):

rsocket-go/internal/socket/callback.go

Lines 33 to 36 in 473989b

    
           func (s requestResponseCallback) stopWithError(err error) { 
        
           	s.sink.Error(err) 
        
           	common.TryRelease(s.cache) 
        
           }

c. However, the sink is already being used by another client. We are sending Error to a completely unrelated client!!! Race condition!
d. The other (unrelated) client gets a false-positive error that the socket is closed!

At some point after Close() executes, the Go routine from step 8 is scheduled and is finally able to unregister the handler - but it's too late - the race condition already occurred.

Modifications/Fix:

Correctly ordered the relevant operations to avoid the race condition:

Unregister handler callback with its sink (i.e. processor) first.
Dispose (place back into the global pool) of the sink (processor) last.

Result:

The stress test succeeds after this change.

jjeffcaii

LGTM

jjeffcaii · 2024-12-26T12:21:33Z

Awesome!!! Thanks for your contribution. 👍🏽

Summary: This will no longer be required, once this lands upstream: rsocket/rsocket-go#135 This is pretty much just a revert of D60833191. Reviewed By: leoleovich Differential Revision: D67235492 fbshipit-source-id: 118144ea8c0e6169ef07f1a43d5cdac403f05a02

Fixed processor dispose race condition

7979dfe

jjeffcaii approved these changes Dec 26, 2024

View reviewed changes

jjeffcaii merged commit 8065699 into rsocket:master Dec 26, 2024
2 checks passed

echistyakov deleted the fix-processor-dispose-race-condition branch December 26, 2024 19:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed processor dispose race condition #135

Fixed processor dispose race condition #135

echistyakov commented Dec 17, 2024

jjeffcaii left a comment

jjeffcaii commented Dec 26, 2024

	onFinally := func(s reactor.SignalType, d reactor.Disposable) {
	common.TryRelease(handler.cache)
	d.Dispose()
	if s == reactor.SignalTypeCancel {
	dc.sendFrame(framing.NewWriteableCancelFrame(sid))
	}
	dc.unregister(sid)
	}

	err := dc.GetError()
	if err == nil {
	dc.destroyHandler(errSocketClosed)
	} else {
	dc.destroyHandler(err)
	}

	func (s requestResponseCallback) stopWithError(err error) {
	s.sink.Error(err)
	common.TryRelease(s.cache)
	}

	handler.sink = s

	dc.register(sid, handler)

Fixed processor dispose race condition #135

Fixed processor dispose race condition #135

Conversation

echistyakov commented Dec 17, 2024

Motivation:

Modifications/Fix:

Result:

jjeffcaii left a comment

Choose a reason for hiding this comment

jjeffcaii commented Dec 26, 2024