-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up writing of Varint21 #3453
base: master
Are you sure you want to change the base?
Conversation
Is this your original implementation/code (read: not copied)? |
I did use chatgpt first a bunch of times until it correctly unrolled and combined the stuff. (I ran its code through tests and told it the wrong & correct result for the random number test and at some point it worked.) The different implementations from the steinborn blog post are MIT-licensed, so it'd not be a problem to source it from there as well. |
Prefer to avoid introducing externally licensed code |
Well RIP? |
If its not copied it might be ok |
I looked back at the issue comments and it seems that I had very similar code, looks like just switch syntax difference (am just on my phone now, can't compare really good) before others posted the link to the steinberg's code. |
Haven't compared the code, but unrolling a VarInt is a common practice that has been made somewhat less common given the ability of modern hardware (And compilers unrolling). Only reason it's viable here is because we're writing to a native buffer through a VM, and there's tons of overhead. So reducing Just to prove a point, this same generic code exists here too: Although they don't have a writeMedium function like we fortunately do. So to say it's "copying" is a bit absurd, unless Jammm14 literally copy/pasted every tiny bit of code lol. Just generic VarInt byteshifts |
Benchmarks indicate this is 20-50% faster (higher when writing longer varints).
See
#3451 (comment)
https://steinborn.me/posts/performance/how-fast-can-you-write-a-varint/
With just a being few nanoseconds per operation this is likely just a small improvement and not a noticable difference overall.
As we just rarely have to touch that part of the code, I think it is okay to have this optimized, unrolled loop and condensed write calls here.
In further benchmarks I made there was no significant difference leaving out the 0x7F mask for the uppermost bit, so I added it to be similar to the code of the blog post.
The varint writing speedup in the benchmark is lower on java 19 (just ~10% faster) compared to java 16/17 (~30% faster), original code speed is similar. For the benchmark 2048 random short numbers were used, the speedup stays similar using other batch sizes (like 4096 or 1024) / random seeds.
I don't want to spend time looking whether this is tunable with jvm flags or what exactly is going on there tho, also because benchmarks are never identical to the real workload...
However this change is very likely not just causing improved benchmark scores, but also real-world speedup, because data dependency is reduced, loop is replaced with a switch (less jumps), writing to buffer is not done byte-by-byte (cpu's are used to handling more bytes than 8 at once). These improvements go beyond what the jvm is capable of doing.