Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always receive "stream terminated by RST_STREAM with error code: STREAM_CLOSED" after grpc request #51

Closed
ray2011 opened this issue Feb 24, 2021 · 23 comments · Fixed by tsloughter/chatterbox#12

Comments

@ray2011
Copy link

ray2011 commented Feb 24, 2021

When go grpc client or grpcbox client call gopcbox server, both see RST_STREAM in tcpdump log:

go client call:
image

grpcbox client call:
image

RST_STREAM detail:
image

when go grpc client or grpcbox client call go grpc server, not see RST_STREAM in tcpdump log:

go grpc call:
image

grpcbox call:
image

go grpc client don't handle RST_STREAM, it will return an error, but grpcbox won't.

go error:
image

grpcbox result:
image

My issue is: why RST_STREAM happened after the grpc response send?

@ray2011
Copy link
Author

ray2011 commented Feb 24, 2021

rebar.config :
image

sys.config:
image

@fenollp
Copy link

fenollp commented May 3, 2021

Same issue, here's the (short) code and the call:

grpcurl  -proto ../proto/helloworld/helloworld.proto -d '{"name":"qweqweqwe"}' -plaintext localhost:50051 helloworld.Greeter/SayHello

Note: when running in rebar3 shell the first call works as intended but always fails from the 2nd call on.

ERROR:
  Code: Internal
  Message: stream terminated by RST_STREAM with error code: STREAM_CLOSED

Maybe it only has to do with the way Go terminates the connection as I've tested (and received this message) only with grpcurl and ghz

@fenollp
Copy link

fenollp commented May 6, 2021

@tsloughter could this be due to the transport being insecure?

@tsloughter
Copy link
Owner

I still don't know when I'll have time to investigate and fix this but could I get some more info, like does every request between erlang and go fail?

And I realize I'm not sure if the issue is the way it is being terminated or if the issue is how grpcbox handles the termination? Like "go grpc client don't handle RST_STREAM, it will return an error, but grpcbox won't.", is this saying Go doesn't handle the RST_STREAM because it shouldn't be sent?

@ray2011
Copy link
Author

ray2011 commented Jun 1, 2021

Is this saying Go doesn't handle the RST_STREAM because it shouldn't be sent?

Yes.

#####Errors

When an application or runtime error occurs during an RPC a Status and Status-Message are delivered in Trailers.

In some cases it is possible that the framing of the message stream has become corrupt and the RPC runtime will choose to use an RST_STREAM frame to indicate this state to its peer. RPC runtime implementations should interpret RST_STREAM as immediate full-closure of the stream and should propagate an error up to the calling application layer.

RST_STRWAM means an error, but no error occured.

@lixen-wg2
Copy link
Contributor

It seems to be introduced in:
tsloughter/chatterbox@e15e56d

@lixen-wg2
Copy link
Contributor

I was able to reproduce the issue with

wget https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/v0.4.2/grpc_health_probe-linux-amd64

and then runnning it twice

❯ ./grpc_health_probe-linux-amd64  -addr=localhost:8080
status: SERVING
 ~/Downloads
❯ ./grpc_health_probe-linux-amd64  -addr=localhost:8080
error: health rpc failed: rpc error: code = Internal desc = stream terminated by RST_STREAM with error code: STREAM_CLOSED

and verify that revert of tsloughter/chatterbox@e15e56d solve the issue.

@shhnwz
Copy link

shhnwz commented Jun 29, 2021

I was able to reproduce the issue with

wget https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/v0.4.2/grpc_health_probe-linux-amd64

and then runnning it twice

❯ ./grpc_health_probe-linux-amd64  -addr=localhost:8080
status: SERVING
 ~/Downloads
❯ ./grpc_health_probe-linux-amd64  -addr=localhost:8080
error: health rpc failed: rpc error: code = Internal desc = stream terminated by RST_STREAM with error code: STREAM_CLOSED

and verify that revert of tsloughter/chatterbox@e15e56d solve the issue.

This Issue gave us nightmare in a pre-production environment.. Thanks for identifying this, we have to revert this commit to move further.

@tsloughter
Copy link
Owner

@shhnwz does using tsloughter/chatterbox#7 instead of reverting work for you?

@lixen-wg2
Copy link
Contributor

@tsloughter we are running with that pr and have not seen any issues so far.

@lixen-wg2
Copy link
Contributor

That was not true when running with my pr we end up leaking h2_stream procs. Will drop that for now and revert instead.

@tsloughter
Copy link
Owner

Maybe just need to stop the process in that event handler you added in the PR?

@lixen-wg2
Copy link
Contributor

Yes that worked. I was thinking it would end up in the normal flow and stop.

@tsloughter
Copy link
Owner

I think that makes more sense. Stream is in closed state, so just stop it.

I wonder if the RST_STREAM was meant as a catch all for if there were messages sent to the stream process that weren't understood. The send_t is understood so simply matching it and closing is the correct thing to do.... Just trying to think this through. Could use a http2 flow chart :)

@tsloughter
Copy link
Owner

If tests pass I'll probably merge this and try to get a release made soon.

@tsloughter
Copy link
Owner

So there is a test failure is why this hasn't been merged yet.

I haven't been able to look at it and haven't looked at anything really in months. But I start work again on the 13th and part of that job will involve grpcbox, so I will be digging in with fixes, performance improvements and making releases again finally in the near future.

@lixen-wg2
Copy link
Contributor

The tests are not failing for me locally but do not know how to rerun them in gh actions.

@fenollp
Copy link

fenollp commented Jan 9, 2022

Both PRs fail CI with the same reason.

PR CI
tsloughter/chatterbox#7 https://github.com/fenollp/chatterbox/runs/4754384849?check_suite_focus=true
tsloughter/chatterbox#12 https://github.com/fenollp/chatterbox/runs/4754378209?check_suite_focus=true
%%% http2_spec_3_5_SUITE: 
=CRASH REPORT==== 9-Jan-2022::18:08:45.655765 ===
  crasher:
    initial call: chatterbox_ranch_protocol:init/4
    pid: <0.2102.0>
    registered_name: []
    exception exit: invalid_preface
      in function  h2_connection:become/3 (/home/runner/work/chatterbox/chatterbox/src/h2_connection.erl, line 205)
    ancestors: [<0.2082.0>,<0.2081.0>,ranch_sup,<0.1502.0>]
    message_queue_len: 0
    messages: []
    links: [<0.2082.0>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 610
    stack_size: 27
    reductions: 1414
  neighbours:

=ERROR REPORT==== 9-Jan-2022::18:08:45.655929 ===
Ranch listener chatterbox_ranch_protocol had connection process started with chatterbox_ranch_protocol:start_link/4 at <0.2102.0> exit with reason: invalid_preface


%%% http2_spec_3_5_SUITE ==> sends_invalid_connection_preface: FAILED
%%% http2_spec_3_5_SUITE ==> {{badmatch,ok},
 [{http2_spec_3_5_SUITE,send_invalid_connection_preface,2,
                        [{file,"/home/runner/work/chatterbox/chatterbox/test/http2_spec_3_5_SUITE.erl"},
                         {line,48}]},
  {http2_spec_3_5_SUITE,sends_invalid_connection_preface,1,
                        [{file,"/home/runner/work/chatterbox/chatterbox/test/http2_spec_3_5_SUITE.erl"},
                         {line,25}]},
  {test_server,ts_tc,3,[{file,"test_server.erl"},{line,1754}]},
  {test_server,run_test_case_eval1,6,[{file,"test_server.erl"},{line,1263}]},
  {test_server,run_test_case_eval,9,[{file,"test_server.erl"},{line,1195}]}]}

@lixen
Copy link

lixen commented Jan 10, 2022

Seems to be this issue tsloughter/chatterbox#6

@tsloughter
Copy link
Owner

Looking into this again. I'm hitting the issue with grpcurl and kreya when trying to test a service.

@fenollp
Copy link

fenollp commented Jun 1, 2022

It seems chatterbox is missing a new release + grpcbox should depend on that new version before this is fixed, correct?

@tsloughter
Copy link
Owner

@fenollp yea, mixed up my repos, hehe. I'll get those releases out soon.

@tsloughter
Copy link
Owner

Done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants