-
Notifications
You must be signed in to change notification settings - Fork 339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use bit twiddling to speed up JSON generation. #738
base: master
Are you sure you want to change the base?
Conversation
Relative gains on M3 compared to master on the macro benchmarks:
|
This one is interesting, as it doesn't require any of the annoying feature detection SIMD impose. But if we end up going with SIMD anyway, might as well not bother with this, right? |
That is a judgement call. It's nice to have a pure C implementation that doesn't require any special instructions. If we do go the SIMD route, assuming ARM Neon (Mac m* chips, AWS Graviton (according to Wikipedia)) and x86-64 are the vast majority of CPUs running Edit: This assumes this code is faster on other architectures as well. I have not tested on any other than my M1 and Intel-based Laptop. |
True. I guess my only real reservation with this PR (and also with the SIMD ones) is the huge I haven't looked too much into it, but I'd really like if such huge macro wasn't necessary. So I need to take some time to experiment with some refactoring. |
It's likely enough. x86 alone is likely 95% of Ruby usage if not more, we're probably super close to 100% if you add ARM. For other platform correctness is sufficient. |
It's not necessary. It's the existing conditional. I just didn't want to copy and paste it multiple times.
|
Yes, I mean not having that big macro without copy-pasting either. What I have in mind right now, but I don't know if it's really possible, would be to move the "search" part in another function, and let it having some state with a stack allocated struct so it can resume. Something very much like https://lemire.me/blog/2024/07/20/scan-html-even-faster-with-simd-instructions-c-and-c/ So the pseudo-code would look like: scanner_state state = {0};
while (ptr = scan(&state, ptr)) {
// process one byte
ptr++;
} This way all the aligment consideration and such are moved in that |
NB: I'm not asking you to do this. If you wish to feel free to, but otherwise I want to find some time to try it before I merge this PR. |
Create this as separate from the SIMD branch.
Use bit twiddling to speed up JSON generation.
This effectively inlines
memchr(ptr, '"', len)
andmemchr(ptr, '\\', len)
as well as a<each byte in chunk> < 0x20
comparison.Benchmarks
Macbook Air M1
This Branch
Master