-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data channel hangs over a lossy connection #138
Comments
I have noticed that DATA_CHANNEL_ACK message was not sent reliably and marked as "abandoned" immediately after the first transmission. draft-ietf-rtcweb-data-protocol-09.txt in Section 6, says:
Clearly, pion/sctp's implementation needs to be correct. |
The above fix did not solve the problem. Data channel still hangs, and Chrome still returns SACK with advertise receive window being 0. |
@tuexen I have been working on this datachannel hang issue for a while, found multiple of bugs in pion, which was good. But now, I'm struggling to find the cause of this hang, and I am hoping to hear your thoughts/suggestions if possible. The attachment (zip) contains two files:
Summary of what I see:
What I don't understand is, SACK chunks sent by Chrome indicates its cumulative TSN is advances event after a_rwnd hits 0. But the app on the Chrome stops receiving onmessage callbacks. From the pcap above, Pion does not receive any chunks reporting error or abort. I am not 100% sure, although this does not occur always, when it happens, it happens more at the very beginning of the connection. |
I tried with Firefox and Safari but I have not been able to reproduce the hang with these browsers. (I did at least 10 times for each browser with the same settings; unordered, maxRetransmts=0, 2500B/msg). Also, when I set unordered flag to false (ordered transmission), this does not happen with Chrome. |
I used browsers instead of pion (the answerer side). The hang occurs with Chrome (offerer). It does not happen with Firefox nor Safari. Browsers I used for the offerer side were Firefox and Chrome. @tuexen I have to conclude that the hang I am seeing with data channel (ordered=false & maxRetransmits=0) is caused by a bug in Chrome, and Firefox also uses usrsctp, I guess the bug is in libwebrtc. (Does anyone know what user-space implementation of SCTP Safari use?) Summary (conditions to cause the data channel hang)
Comcast command example to reproduce:
These are the browser versions I used:
|
There were some bug fixes in this area which might not be in the version used by the browsers yet. Let me test this locally... |
@tuexen In case it helps, here is what Chrome:
usersctp: info Safari:
The problem does not happen on my Safari though both use AppleWebKit. Do you think the bug was fixed between versions, |
Can't reproduce the issue with the current master branch. Using
as the receiver and
as the sender. Using a packet drop rate of 2%. The transfer has a constant bandwidth for an hour. |
@tuexen I was able to reproduce it using tsctp (current master)! Occurrence ratio in the first 20 seconds is about 50% to me. I was able to capture packets with wireshark and also console logs of both ends - attached. Comcast command:
Receiver command:
Sender command:
In about 4 seconds, the sender receives SACK with a_rwnd=0 then it wouldn't recover. I am using macOS (version 10.15.6 (19G73)) |
@enobufs Using the latest sources? |
Yes. Current master HEAD at rev: 31f4eb5.
|
OK. Will retest. I was testing on two FreeBSD based (slow) machines. That should not matter... |
@tuexen I have been running the tests repeatedly and noticed when it happens, it happens within the first 10 seconds. If not it never happens in the next 50 seconds. So, it looks like the trigger is at the very beginning... |
@enobufs Are you applying the packet loss rate uni-directional (which direction) or bi-directional? I was applying the packet loss uni-directional such that only packets with DATA chunks (or FORWARD-TSN chunks) are affected, but not packets with SACK chunks. |
Bi-directional and not targeted to a specific type of chunk. I am using this tool for the network impairment. |
I'm using |
@tuexen I added some debug logs to the code to track how cum_tsn, new_cum_tsn (by ForwardTSN), size_on_reasm_queue (and a few others) are moving. Please find the attachment (zip) to this message which has the receiver side log in it. I do not fully understand but
After this line, DATA chunk with TSN=3299948022 appears to be lost. Later, Forward TSN comes in as you can see with these (adjacent) lines:
I notice, in the above lines, TSN jumps from 3299948023 to 3299948033. This line is printed in the function, asoc->cumulative_tsn = asoc->mapping_array_base_tsn + (at-1);
SCTPDBG(SCTP_DEBUG_INDATA1, "@YT: cum_tsn=%u (3)\n", asoc->cumulative_tsn); I have no clue what "slide mapping" means, but I believe it is advancing cum_tsn beyond new_cum_tsn as it already received those DATA chunks. The problem I see is, once this TSN advancement occurs, I see a bunch of this log messages:
which comes from the function if (SCTP_TSN_GE(asoc->cumulative_tsn, new_cum_tsn)) {
/* Already got there ... */
SCTPDBG(SCTP_DEBUG_INDATA1, "@YT: FWDTSN: already got three...\n");
return;
}
As it hits the I have no idea if I am getting close to the bottom of it, but I am just hoping that this would help you finding some clue, and reproducing on your environment. |
Update: I can reproduce this locally. Will look into it. |
@tuexen That's great news! |
@tuexen I'm wondering if you'd like to me to create an issue in the sctplab/usrsctp as this is no longer a pion's issue? |
Will work on it soon... My day job kicked in... If you prefer, you can move it over to usrsctp. It is not a bug in pion... |
Here is a
Now figuring out what is going wrong... |
messages using DATA chunks. Don't use fsn_included when not being sure that it is set to an appropriate value. If the default is used, which is -1, this can result in SCTP associaitons not making any user visible progress. Thanks to Yutaka Takeda for reporting this issue for the the userland stack in pion/sctp#138. MFC after: 3 days git-svn-id: svn+ssh://svn.freebsd.org/base/head@366198 ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f
messages using DATA chunks. Don't use fsn_included when not being sure that it is set to an appropriate value. If the default is used, which is -1, this can result in SCTP associaitons not making any user visible progress. Thanks to Yutaka Takeda for reporting this issue for the the userland stack in pion/sctp#138. MFC after: 3 days
user data using DATA chunks in the receive path. This fixes pion/sctp#138 Thanks to Yutaka Takeda for reporting the issue.
user data using DATA chunks in the receive path. This fixes pion/sctp#138 Thanks to Yutaka Takeda for reporting the issue.
user data using DATA chunks in the receive path. This fixes pion/sctp#138 Thanks to Yutaka Takeda for reporting the issue.
user data using DATA chunks in the receive path. This fixes pion/sctp#138 Thanks to Yutaka Takeda for reporting the issue.
user data using DATA chunks in the receive path. This fixes pion/sctp#138 Thanks to Yutaka Takeda for reporting the issue.
user data using DATA chunks in the receive path. This fixes pion/sctp#138 Thanks to Yutaka Takeda for reporting the issue.
user data using DATA chunks in the receive path. This fixes pion/sctp#138 Thanks to Yutaka Takeda for reporting the issue.
OK, I think the issue is fixed. The problem was that in
is checking
avoids this problem. Thanks a again for testing and reporting the issue! |
@tuexen That's great news! I am glad if this helped WebRTC community be a better place! |
@enobufs If you want, you can close this issue. Up to you. |
Improve the handling of receiving unordered and unreliable user messages using DATA chunks. Don't use fsn_included when not being sure that it is set to an appropriate value. If the default is used, which is -1, this can result in SCTP associaitons not making any user visible progress. Thanks to Yutaka Takeda for reporting this issue for the the userland stack in pion/sctp#138.
Improve the handling of receiving unordered and unreliable user messages using DATA chunks. Don't use fsn_included when not being sure that it is set to an appropriate value. If the default is used, which is -1, this can result in SCTP associaitons not making any user visible progress. Thanks to Yutaka Takeda for reporting this issue for the the userland stack in pion/sctp#138. MFS r366329: Improve the input validation and processing of cookies. This avoids setting the association in an inconsistent state, which could result in a use-after-free situation. This can be triggered by a malicious peer, if the peer can modify the cookie without the local endpoint recognizing it. Thanks to Ned Williamson for reporting the issue. Approved by: re (gjb)
messages using DATA chunks. Don't use fsn_included when not being sure that it is set to an appropriate value. If the default is used, which is -1, this can result in SCTP associaitons not making any user visible progress. Thanks to Yutaka Takeda for reporting this issue for the the userland stack in pion/sctp#138. MFC after: 3 days
Improve the handling of receiving unordered and unreliable user messages using DATA chunks. Don't use fsn_included when not being sure that it is set to an appropriate value. If the default is used, which is -1, this can result in SCTP associaitons not making any user visible progress. Thanks to Yutaka Takeda for reporting this issue for the the userland stack in pion/sctp#138.
messages using DATA chunks. Don't use fsn_included when not being sure that it is set to an appropriate value. If the default is used, which is -1, this can result in SCTP associaitons not making any user visible progress. Thanks to Yutaka Takeda for reporting this issue for the the userland stack in pion/sctp#138. MFC after: 3 days git-svn-id: svn+ssh://svn.freebsd.org/base/head@366198 ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f
messages using DATA chunks. Don't use fsn_included when not being sure that it is set to an appropriate value. If the default is used, which is -1, this can result in SCTP associaitons not making any user visible progress. Thanks to Yutaka Takeda for reporting this issue for the the userland stack in pion/sctp#138. MFC after: 3 days
messages using DATA chunks. Don't use fsn_included when not being sure that it is set to an appropriate value. If the default is used, which is -1, this can result in SCTP associaitons not making any user visible progress. Thanks to Yutaka Takeda for reporting this issue for the the userland stack in pion/sctp#138. MFC after: 3 days
Improve the handling of receiving unordered and unreliable user messages using DATA chunks. Don't use fsn_included when not being sure that it is set to an appropriate value. If the default is used, which is -1, this can result in SCTP associaitons not making any user visible progress. Thanks to Yutaka Takeda for reporting this issue for the the userland stack in pion/sctp#138. MFS r366329: Improve the input validation and processing of cookies. This avoids setting the association in an inconsistent state, which could result in a use-after-free situation. This can be triggered by a malicious peer, if the peer can modify the cookie without the local endpoint recognizing it. Thanks to Ned Williamson for reporting the issue. Approved by: re (gjb) git-svn-id: https://svn.freebsd.org/base/releng/12.2@366335 ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f
Improve the handling of receiving unordered and unreliable user messages using DATA chunks. Don't use fsn_included when not being sure that it is set to an appropriate value. If the default is used, which is -1, this can result in SCTP associaitons not making any user visible progress. Thanks to Yutaka Takeda for reporting this issue for the the userland stack in pion/sctp#138. git-svn-id: https://svn.freebsd.org/base/stable/12@366324 ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f
Improve the handling of receiving unordered and unreliable user messages using DATA chunks. Don't use fsn_included when not being sure that it is set to an appropriate value. If the default is used, which is -1, this can result in SCTP associaitons not making any user visible progress. Thanks to Yutaka Takeda for reporting this issue for the the userland stack in pion/sctp#138.
Closing, it was great reading this back @enobufs :D |
Your environment.
What did you do?
What did you expect?
The packets sent from pion should be continuously received by Chrome.
What happened?
The text was updated successfully, but these errors were encountered: