-
-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nextWakupNanos lazySet. #149
Conversation
The memory ordering effects of the nextWakupNanos are irrelevant. We just need to make sure that the compiler doesn't optimize out the load or the store. A setOpaque should be sufficient. But since the code needs to be Java 8 compatible, setRelease is the cheapest alternative. This will restrict compiler optimizations but doesn't lead to a memory fence on the X86 since every store is a release store (the semantics provided by a setLazy). In a subsequent PR I'll deal with the getAndSet in the finally clause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM :)
@pveentjer did you sign our ICLA? |
I just did. |
@pveentjer Would you also be interested to do the same for our native epoll / kqueue transports in netty itself ? |
@normanmaurer sure. But let me fix the io_uring one first. I'm still trying to get a proper understanding of the wakeup mechanism in combination with the atomic operations and the memory ordering. I need a fresh brain for that :) |
|
||
// The memory ordering effects of the nextWakeupNanos are irrelevant. A setOpaque should be sufficient, | ||
// but on Java 8 the lazySet is the cheapest alternative. It is free on X86 since every store is a release store. | ||
nextWakeupNanos.lazySet(curDeadlineNanos); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pveentjer we must ensure that this write happens before the subsequent hasTasks()
read or else a (non-IO) task could be put in the queue by another thread without us being awoken.
Note that the cost here should not be significant anyhow. We are about to make a blocking syscall because there's no more work to do, so it is not on a critical path. A heavily loaded system should rarely reach here since we'll remain in the CQ processing loop below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you need to order an older store with a newer load to a different address. So you need a [StoreLoad] which is provided by a volatile write and not by something weaker. So my change would break the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pveentjer so I guess we should close this pr ?
Yes. Best to close this one. I got a lot better understanding of the signaling mechanism and the queue. By having 2 separate mechanisms, the expensive [StoreLoad] is needed. If the flag to wakup and the queue would be integrated into the same structure, I don't think the expensive [StoreLoad] is needed on the I/O thread. |
Thanks @pveentjer. You're right, and in fact a while back I prototyped such a custom queue implementation, I may try to resurrect that for interest. One issue though is that since then the queue impl was made pluggable and a fully integrated queue would probably need to break that extensibility. |
For Hazelcast I did something similar. Before beginning on such a journey, I would first make a JMH test and check if the current implementation is actually causing performance degradation or if there are other more urgent problems. As any self-respecting engineer, I have made optimizations with unproven improvements :) |
The memory ordering effects of the nextWakupNanos are irrelevant. We
just need to make sure that the compiler doesn't optimize out the load
or the store. A setOpaque should be sufficient. But since the code needs
to be Java 8 compatible, setRelease is the cheapest alternative.
This will restrict compiler optimizations but doesn't lead to a memory
fence on the X86 since every store is a release store (the semantics
provided by a setLazy).
In a subsequent PR I'll deal with the getAndSet in the finally clause.
Fixes one part of #148