-
Notifications
You must be signed in to change notification settings - Fork 654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NIOAsyncWriter: Fix suspending yield when writer is finished #3044
NIOAsyncWriter: Fix suspending yield when writer is finished #3044
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great catch, thanks so much!
Regarding your test, it might be useful for the test to use a new delegate that makes it easier to arrange the ordering to work out. NIO's ConditionLock can be used to set up a nice ordering chain, so that instead of using the usleeps we can just block on that lock being unlocked with the right value.
bf18517
to
f15a87b
Compare
@Lukasa : I updated it with ConditionLock. Is this what you had in mind? This is still blocking a thread from the concurrency pool and then depending on another thread to unblock it. Will that not be a problem? The only way I see around that is to add an async hook to NIOAsyncWriter. |
For testing purposes, that blocking will be safe. We will eventually resolve it, and we don't rely on Dispatch to make that happen (it requires the NIO thread to do it), so it'll be ok. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a really lovely patch, thanks so much @orobio!
I'm not sure I fully understand what you're saying here, but I assume it's fine then. I do still have that 'FIXME' line in there, which would probably be good to remove. Well..., this got merged while I was typing, so I'll open a new pull request for that. Edit: Opened #3059 |
This was left in when pull request apple#3044 was merged, but as discussed on that pull request, this doesn't need to be fixed.
This was left in when pull request #3044 was merged, but as discussed on that pull request, this doesn't need to be fixed.
It appears this pull gets the NIO tests to hang in the Android x86_64 emulator, but weirdly only when compiled and run on macOS, ie the linux CI runs shown there work fine. I don't know if that's a macOS cross-compilation issue or some incompatibility with the macOS emulator triggered by this pull, possibly both. Reverting this pull gets my Android CI runs on macOS to work again. Let me know what you guys think. |
That’s very interesting. Do you have a smaller repro? The OS-specificity makes me think this is probably a Swift bug, but if it’s easy enough to reproduce we may be able to figure out if the test that hangs is just flaky. |
A bit tough for me to reproduce since I don't use macOS. @marcprux, any interest in looking into this on your Mac? |
When I test against a local Android emulator (with
However, the test run does ultimately fail due to a leaked promise. Edit: see next comment.
Here's the leaked promise, although it is likely a red herring:
|
Actually, scratch all that. It seems to be hanging randomly, on both macOS and Ubuntu runners, irrespective of API level. See the runs at https://github.com/marcprux/swift-nio/actions/runs/12812276673 I'm still not able to get it to hang on my local machine, but I'll keep trying. |
Does it always hang on the same test? |
No, it seems to vary. Sometimes in
|
Yeah, that seems fairly likely. Can you run the underlying test binary directly? Potentially running it under |
On my Android CI, it is entirely consistent: linux CI runs pass every time while macOS runs always hang. See the run yesterday: across two attempts with this pull, I had four macOS hangs and six linux runs pass (ignore the devel macOS failures, that's a separate issue). So much so that today I only reverted this pull for macOS, finagolfin/swift-android-sdk@7476f2be5, and all the runs passed. |
@finagolfin What happens if you only revert the test? |
I don't think this test is the issue, as it appears to be failing elsewhere in the log, but I will check that and let you know. |
Nope, you were right to suspect the test, as simply reverting this test on macOS got the NIO tests to pass again. 😄 Marc, can you confirm? |
Confirmed: if I remove the test, then the hangs never occurs in the emulator (running on macOS-13 or Ubuntu). |
@marcprux, are you disabling all NIOAsyncWriterTests though, as shown in the linked commit above, marcprux/swift-nio@87d02310d? If so, it would be better to just disable this new test, as I did above, to see if this code change affects the older tests too, which you are no longer running. I'm going to try running the linux-compiled tests on the macOS emulator and vice versa next, to see if that helps narrow down where this strange hang is coming from, as it could be that we're hitting a bug in the Android emulator for macOS. |
This could be an emulator bug, but it could also be that we're seeing an ordering issue flushed out by the performance overhead of the emulator. It'd be useful to try to get a "sequence of events" log from your hanging test as well, as concurrent tests are horrible to debug. |
This fixes hitting the following preconditionFailure in NIOAsyncWriter: "This should have already been handled by
yield()
".It doesn't expect a yield to be suspended when the state is
.writerFinished
, but this can definitely happen. This seemed to be the correct solution to me, but please check it carefully, because I'm unfamiliar with this code.I'm not happy about the test for this. It's blocking the
didYield
call for a while, to make sure the correct state is reached. See also the 'FIXME' line. What would be a better way to do this?