Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock in subscription client #160

Open
andig opened this issue Jan 22, 2025 · 2 comments
Open

Deadlock in subscription client #160

andig opened this issue Jan 22, 2025 · 2 comments

Comments

@andig
Copy link

andig commented Jan 22, 2025

In evcc-io/evcc#17925 (comment) we have a problem where the subscription client goes into a deadlock. Here's what I think happens based on additional logging added in master...andig:go-graphql-client:log:

!! ReadJSON: failed to read JSON message: failed to read: use of closed network connection
!! ReadJSON: EOF
[pulse ] TRACE 2025/01/21 17:00:09 {"type":"connection_init","payload":{"token":"***"}} client
!! Run: <-sc.errorChan retry subscription client
!! Run: <-sc.errorChan: errRetry

The last log line then deadlocks on writing to the error channel in master...andig:go-graphql-client:log#diff-828fe8a645cf1a278fb0dfdca1eac3d6b4c691415c66dbe854d6286e2578031bR783 as the goroutine profile in linked issue shows:

!! ReadJSON: failed to read JSON message: failed to get reader: use of closed network connection
!! ReadJSON: EOF

maps to

1 @ 0x93c98 0x176f8 0x1737c 0xff46e0 0x9baf0
#	0xff46df	github.com/hasura/go-graphql-client.(*SubscriptionClient).Run.func1+0x4eb	github.com/hasura/[email protected]/subscription.go:783

in

					// manual EOF check
					if err == io.EOF || strings.Contains(err.Error(), "EOF") || errors.Is(err, net.ErrClosed) || strings.Contains(err.Error(), "connection reset by peer") {
						fmt.Println("!! ReadJSON: EOF")
						sc.errorChan <- errRetry
						return
					}

Unfortunately, I can't find how that is possible since every errRetry should start a new receiver go routine.

@hgiasac
Copy link

hgiasac commented Jan 23, 2025

Hmm, it's great if you have an example to reproduce the bug. Otherwise, I do think of spawning a dedicated health check thread to handle deadlocks and other race condition issues

@andig
Copy link
Author

andig commented Jan 24, 2025

Unfortunately no example. We're still trying to add more logging to understand, why the deadlock happens. I assume we're leaving the consuming go routine in Run somewhere we shouldn't while the main go routine continues to run, but haven't found where or why yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants