-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#3392: Consolidate flushes, leading to reduces syscalls and higher performance #3393
base: master
Are you sure you want to change the base?
Conversation
04ce471
to
541cb69
Compare
this is now out for testing in IvanCord as with many more additions by Janmm14 and others (myself included) available at https://ci.mrivanplays.com/ |
Can the delay of packets be benchmarked? |
the delay should be minimal as it flushes on every channelReadComplete and any state change |
Sounds cool, I'm testing it and I see around 30-50% less CPU usage. 🚀 |
and also the client and server are ticking at a very low frequency of 50ms so a delay for the packet handling should not be noticeable |
Well 50ms is not so high frequency, but I see it like a hard-limiter to not have more than 50ms of the delay. :D |
Do I understand this patch correctly? When the writes happen on the spigot side the write happens on bungee's side, the same with the flush, more or less? |
Thats roughly the idea of this patch. |
That's really cool, I think it will be a good idea to increase the 20 packets limit, road to 1% of CPU usage :D For example you have 20 players which each send 20 packets per second, you have 400 packets per second but you still have 20 flushes instead of more optimized amount, but of course it's way better than having flushes on every packet, I will increase the amount and see what we'll got. :) |
Also it is really important, when spigot flushes, bungee flushes, is that correct? |
These packet counts are per-player. And that constant
Short answer: yes. Long answer: Yes, kinda. When spigot flushes, the data starts to be sent to bungee, leading to a read start on bungee. A read start prevents flushes for the given player. Bungee reads all the incoming data, flushes which usually happen after each forwarded packet are ignored. When bungee is finished with reading the current batch of recieved bytes (no more data aiting in OS tcp buffer to be read), flushes are allowed again and bungee does flush. We don't need to do flushes every second for safetyness, as our default behaviour is still to flush. Our safetiy is the flush every 20 packets. |
Yes I know that they are per player, but I put the example as you are surrounded with players, each one sends 20 movement packets per second + some other packets like arm swings for example. All right, so when I have a batch of 400 packets, I will have 20 flushes. And when I have a single packet, I will get a single flush. If yes that's really cool and I will test tomorrow with the 200 packet limit. Another great performance boost from you, you are really doing a good job, another one from my issue, you are a superhero! 🦸 |
If the number is being put really high, at some point the "limiting" factors will be that the current read operation finished etc. So the difference of 20 vs 200 will not be that high, as you won't have 200 packets per ticks. and thank you |
Shouldn't it be better? Still less flushes and flushes when spigot wants flushes, right? |
It shouldnt harm to set it higher. Only important thing is per tick, as spigot flushes per-tick and we want to send the player stuff per-tick. And if spigot flushes once per tick, bungee will do too usually. |
All right, flush being called should call readComplete on bungee, right? Edge cases when it wouldn't not happen are possible? |
If the data transfer would be very, very slow and the whole process of recieving data takes at least 50ms. |
Does spigot really flush ones per tick? |
I meant that It does extra flush no matter what every tick, normal spigot flushes every packet that's right. |
I was talking more about paper here @Outfluencer |
Is it well tested now? |
I think so, yea 😎 |
Why do you think so, did u tested it on your network? |
Yes |
What are the changes from Netty and why? |
Netty's A sends multiple requests to B. B's behaviour is to handle the request synchroniously and send the result back to A on the same connection/same netty pipeline. A -> B -> A Bungee however is not a server which processes data, it is a proxy which forwards data to another connection/netty pipeline. C -> B -> S So while we are reading "incoming from server" we are writing to "outgoing to client" (the original
Detect ongoing read loop at: incoming from client -> block flushes while reading at: outgoing to server In detail:
The (Due to how we set up the server's connection (same eventLoop as client connection) we don't need to handle race conditions.) |
I see that you change the player's flush target in ServerConnector init, is this too early? What happens if the connector fails? Maybe it is a better time to change it at set DownstreamBridge. |
@caoli5288 |
Move to https://github.com/SpigotMC/BungeeCord/blob/master/proxy/src/main/java/net/md_5/bungee/ServerConnector.java#L328 is fine. On this you can have two ChannelWrapper and set target each other, the code looks more beautiful.
|
Netty' io_uring impl is still slower than epoll's. Further, to add an uncompressed packetid to the header for each packet on spigot, which can reduce unnecessary decompression. |
Thanks for letting me know.
This would require changes in spigot and all forks. |
I don't think that's intentional, I use https://github.com/2lstudios-mc/FlameCord with some changes, including avoiding creating buf copy. It gave the boost as I remember. |
@Leymooo |
There is nothing about custom client. Only about a little protocol change between bungeecord and server which gives like a 500 - 700% of performance boost. Currently we switch from one bungee instance to multiple, and bungeecord process with 1200 players on ryzen 3600 uses only 270% at all(22.5% per cpu thread). You can write 0 as 2 or 3 or 4 or 5 bytes varint. Minecraft protocol allows it. So if server can include packet id before compressed data and followed by any varint(f.e. compressed data size), you always can remove this extra packet id from bytebuf without any buffers copy and vanilla client will be able to decode it. |
I've already done a partial decompression of the packets in my fork master...VimeWorld:BungeeCord:master (Improved decompressor followed by a few commits with fixes). It decompresses the first 8192 bytes of the packet and checks the packet id. No protocol changes are required and it saves near 50% of CPU. The only change needed is a |
Is it something different than this patch? #3240 |
Decompression itself is really fast task, especially in the RAM. With 8192 bytes you still decompress like a 85% of all packets from server, and 99% from client, so this 50% of cpu save comes from eleminating recompression rather than decompressing only 8192 bytes. BungeeCord decompressor may require to do multiple native(JNI) calls to decompress huge packets, which in my opinion does more harm than uncompressing a whole huge packet in one native call. In conclusion: eliminating recompression is a really a good start for good performance boost, but with only 4 lines of code in backend server it possible to get much more performance. |
@md-5 |
d6ca426
to
f652014
Compare
Rebased to solve conflicts (Edit: missed unused import) |
f652014
to
22ef574
Compare
DEFAULT_EXPLICIT_FLUSH_AFTER_FLUSHES is still not used. By the way does that mean since 1.20.2 client sends packets with flush consolidation? |
I just did not remove that part of code, its still just there because it was inside netty's code.
Clients likely do not use "flush consolidation" mechanism bungee uses (and needs) and instead they stop using writeAndFlush() for every packet and change it to write() for every packet + one call to flush() at the end of each tick - the result is the same tho. |
What are the changes described in "network optimizations" from the screenshot above? |
…rmance Based on Netty FlushConsolidationHandler
22ef574
to
0376eac
Compare
Since vanilla minecraft has added flushing once per tick instead of every packet a short while ago, it would be a perfect time to think about doing similar in bungeecord. @md-5 |
@Leymooo Edit: Nvm i already have it |
In a private fork I did, i just duplicated the packet id before the compressed packet length for "server -> bungee" connection |
Based on Netty FlushConsolidationHandler.
I hope I did the license stuff correctly.
Fixes #3392