Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use bit twiddling to speed up JSON generation. #738

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

samyron
Copy link

@samyron samyron commented Jan 29, 2025

Create this as separate from the SIMD branch.

Use bit twiddling to speed up JSON generation.

This effectively inlines memchr(ptr, '"', len) and memchr(ptr, '\\', len) as well as a <each byte in chunk> < 0x20 comparison.

Benchmarks

Macbook Air M1

This Branch

== Encoding small mixed (34 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json   442.754k i/100ms
          json_coder   474.965k i/100ms
                  oj   419.655k i/100ms
Calculating -------------------------------------
                json      4.380M (± 5.3%) i/s  (228.29 ns/i) -     22.138M in   5.071138s
          json_coder      4.721M (± 3.2%) i/s  (211.84 ns/i) -     23.748M in   5.036884s
                  oj      4.223M (± 1.6%) i/s  (236.80 ns/i) -     21.402M in   5.069275s

Comparison:
                json:  4380401.0 i/s
          json_coder:  4720500.3 i/s - same-ish: difference falls within error
                  oj:  4223023.2 i/s - same-ish: difference falls within error


== Encoding small nested array (121 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json   214.704k i/100ms
          json_coder   217.984k i/100ms
                  oj   172.417k i/100ms
Calculating -------------------------------------
                json      2.134M (± 2.8%) i/s  (468.71 ns/i) -     10.735M in   5.036328s
          json_coder      2.119M (±12.8%) i/s  (471.89 ns/i) -     10.463M in   5.074812s
                  oj      1.728M (± 0.4%) i/s  (578.56 ns/i) -      8.793M in   5.087548s

Comparison:
                json:  2133536.6 i/s
          json_coder:  2119147.6 i/s - same-ish: difference falls within error
                  oj:  1728423.1 i/s - 1.23x  slower


== Encoding small hash (65 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json   448.998k i/100ms
          json_coder   480.577k i/100ms
                  oj   485.067k i/100ms
Calculating -------------------------------------
                json      4.485M (± 0.4%) i/s  (222.96 ns/i) -     22.450M in   5.005564s
          json_coder      4.729M (± 2.0%) i/s  (211.45 ns/i) -     24.029M in   5.082987s
                  oj      4.847M (± 0.5%) i/s  (206.30 ns/i) -     24.253M in   5.003676s

Comparison:
                json:  4485075.4 i/s
                  oj:  4847239.0 i/s - 1.08x  faster
          json_coder:  4729330.3 i/s - 1.05x  faster


== Encoding mixed utf8 (5003001 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json    60.000 i/100ms
          json_coder    59.000 i/100ms
                  oj    34.000 i/100ms
Calculating -------------------------------------
                json    538.363 (±14.9%) i/s    (1.86 ms/i) -      2.640k in   5.008175s
          json_coder    544.629 (±12.3%) i/s    (1.84 ms/i) -      2.714k in   5.059221s
                  oj    357.057 (± 3.6%) i/s    (2.80 ms/i) -      1.802k in   5.053765s

Comparison:
                json:      538.4 i/s
          json_coder:      544.6 i/s - same-ish: difference falls within error
                  oj:      357.1 i/s - 1.51x  slower


== Encoding mostly utf8 (5001001 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json    53.000 i/100ms
          json_coder    46.000 i/100ms
                  oj    34.000 i/100ms
Calculating -------------------------------------
                json    524.858 (± 7.4%) i/s    (1.91 ms/i) -      2.650k in   5.077503s
          json_coder    543.170 (± 7.0%) i/s    (1.84 ms/i) -      2.714k in   5.020620s
                  oj    351.649 (± 3.7%) i/s    (2.84 ms/i) -      1.768k in   5.034501s

Comparison:
                json:      524.9 i/s
          json_coder:      543.2 i/s - same-ish: difference falls within error
                  oj:      351.6 i/s - 1.49x  slower


== Encoding integers (8009 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json     8.123k i/100ms
          json_coder     7.976k i/100ms
                  oj     7.332k i/100ms
Calculating -------------------------------------
                json     80.441k (± 1.1%) i/s   (12.43 μs/i) -    406.150k in   5.049727s
          json_coder     80.854k (± 1.3%) i/s   (12.37 μs/i) -    406.776k in   5.031830s
                  oj     73.209k (± 0.8%) i/s   (13.66 μs/i) -    366.600k in   5.007896s

Comparison:
                json:    80440.5 i/s
          json_coder:    80853.6 i/s - same-ish: difference falls within error
                  oj:    73208.8 i/s - 1.10x  slower


== Encoding activitypub.json (52595 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json     2.045k i/100ms
          json_coder     1.999k i/100ms
                  oj     1.582k i/100ms
Calculating -------------------------------------
                json     20.584k (± 5.4%) i/s   (48.58 μs/i) -    104.295k in   5.082094s
          json_coder     21.065k (± 3.3%) i/s   (47.47 μs/i) -    105.947k in   5.035064s
                  oj     15.678k (± 2.5%) i/s   (63.78 μs/i) -     79.100k in   5.048520s

Comparison:
                json:    20584.0 i/s
          json_coder:    21065.5 i/s - same-ish: difference falls within error
                  oj:    15678.2 i/s - 1.31x  slower


== Encoding citm_catalog.json (500298 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json   109.000 i/100ms
          json_coder   109.000 i/100ms
                  oj    92.000 i/100ms
Calculating -------------------------------------
                json      1.108k (± 2.2%) i/s  (902.53 μs/i) -      5.559k in   5.019493s
          json_coder      1.105k (± 2.7%) i/s  (904.61 μs/i) -      5.559k in   5.032499s
                  oj    914.626 (± 1.9%) i/s    (1.09 ms/i) -      4.600k in   5.031155s

Comparison:
                json:     1108.0 i/s
          json_coder:     1105.5 i/s - same-ish: difference falls within error
                  oj:      914.6 i/s - 1.21x  slower


== Encoding twitter.json (466906 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json   201.000 i/100ms
          json_coder   210.000 i/100ms
                  oj   188.000 i/100ms
Calculating -------------------------------------
                json      2.117k (± 2.6%) i/s  (472.28 μs/i) -     10.653k in   5.034844s
          json_coder      2.169k (± 3.0%) i/s  (460.95 μs/i) -     10.920k in   5.038295s
                  oj      1.915k (± 2.9%) i/s  (522.32 μs/i) -      9.588k in   5.012425s

Comparison:
                json:     2117.4 i/s
          json_coder:     2169.4 i/s - same-ish: difference falls within error
                  oj:     1914.5 i/s - 1.11x  slower


== Encoding canada.json (2090234 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json     1.000 i/100ms
          json_coder     1.000 i/100ms
                  oj     1.000 i/100ms
Calculating -------------------------------------
                json     10.820 (± 9.2%) i/s   (92.42 ms/i) -     54.000 in   5.017790s
          json_coder     10.958 (± 0.0%) i/s   (91.26 ms/i) -     55.000 in   5.019486s
                  oj     10.684 (± 0.0%) i/s   (93.60 ms/i) -     54.000 in   5.054718s

Comparison:
                json:       10.8 i/s
          json_coder:       11.0 i/s - same-ish: difference falls within error
                  oj:       10.7 i/s - same-ish: difference falls within error


== Encoding many #to_json calls (2701 bytes)
json_coder unsupported (Object not allowed in JSON)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json     2.369k i/100ms
                  oj     1.970k i/100ms
Calculating -------------------------------------
                json     22.647k (±11.0%) i/s   (44.16 μs/i) -    111.343k in   5.007278s
                  oj     19.705k (± 0.8%) i/s   (50.75 μs/i) -    100.470k in   5.099096s

Comparison:
                json:    22646.9 i/s
                  oj:    19704.8 i/s - 1.15x  slower

Master

== Encoding small mixed (34 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json   408.721k i/100ms
          json_coder   463.041k i/100ms
                  oj   427.971k i/100ms
Calculating -------------------------------------
                json      4.374M (± 1.1%) i/s  (228.63 ns/i) -     22.071M in   5.046598s
          json_coder      4.594M (± 3.9%) i/s  (217.68 ns/i) -     23.152M in   5.048782s
                  oj      4.207M (± 1.8%) i/s  (237.71 ns/i) -     21.399M in   5.088352s

Comparison:
                json:  4373951.5 i/s
          json_coder:  4593891.7 i/s - same-ish: difference falls within error
                  oj:  4206829.7 i/s - 1.04x  slower


== Encoding small nested array (121 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json   204.636k i/100ms
          json_coder   208.879k i/100ms
                  oj   164.519k i/100ms
Calculating -------------------------------------
                json      2.025M (± 1.5%) i/s  (493.93 ns/i) -     10.232M in   5.054997s
          json_coder      2.079M (± 1.7%) i/s  (480.97 ns/i) -     10.444M in   5.024722s
                  oj      1.728M (± 1.0%) i/s  (578.84 ns/i) -      8.720M in   5.047656s

Comparison:
                json:  2024578.5 i/s
          json_coder:  2079136.1 i/s - same-ish: difference falls within error
                  oj:  1727606.7 i/s - 1.17x  slower


== Encoding small hash (65 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json   449.534k i/100ms
          json_coder   480.253k i/100ms
                  oj   480.245k i/100ms
Calculating -------------------------------------
                json      4.464M (± 0.7%) i/s  (224.02 ns/i) -     22.477M in   5.035527s
          json_coder      4.748M (± 1.2%) i/s  (210.61 ns/i) -     24.013M in   5.058006s
                  oj      4.633M (± 3.5%) i/s  (215.83 ns/i) -     23.532M in   5.085593s

Comparison:
                json:  4463831.3 i/s
          json_coder:  4748140.8 i/s - 1.06x  faster
                  oj:  4633342.9 i/s - same-ish: difference falls within error


== Encoding mixed utf8 (5003001 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json    34.000 i/100ms
          json_coder    35.000 i/100ms
                  oj    35.000 i/100ms
Calculating -------------------------------------
                json    348.297 (± 8.0%) i/s    (2.87 ms/i) -      1.734k in   5.013098s
          json_coder    362.582 (± 7.4%) i/s    (2.76 ms/i) -      1.820k in   5.049010s
                  oj    352.399 (± 3.7%) i/s    (2.84 ms/i) -      1.785k in   5.072121s

Comparison:
                json:      348.3 i/s
          json_coder:      362.6 i/s - same-ish: difference falls within error
                  oj:      352.4 i/s - same-ish: difference falls within error


== Encoding mostly utf8 (5001001 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json    37.000 i/100ms
          json_coder    34.000 i/100ms
                  oj    35.000 i/100ms
Calculating -------------------------------------
                json    356.095 (± 5.3%) i/s    (2.81 ms/i) -      1.776k in   5.002047s
          json_coder    352.925 (± 6.2%) i/s    (2.83 ms/i) -      1.768k in   5.029325s
                  oj    354.508 (± 3.4%) i/s    (2.82 ms/i) -      1.785k in   5.040838s

Comparison:
                json:      356.1 i/s
                  oj:      354.5 i/s - same-ish: difference falls within error
          json_coder:      352.9 i/s - same-ish: difference falls within error


== Encoding integers (8009 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json     7.849k i/100ms
          json_coder     7.886k i/100ms
                  oj     7.325k i/100ms
Calculating -------------------------------------
                json     78.319k (± 1.4%) i/s   (12.77 μs/i) -    392.450k in   5.011962s
          json_coder     78.569k (± 1.1%) i/s   (12.73 μs/i) -    394.300k in   5.019102s
                  oj     72.923k (± 0.8%) i/s   (13.71 μs/i) -    366.250k in   5.022750s

Comparison:
                json:    78319.0 i/s
          json_coder:    78569.2 i/s - same-ish: difference falls within error
                  oj:    72922.6 i/s - 1.07x  slower


== Encoding activitypub.json (52595 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json     1.718k i/100ms
          json_coder     1.748k i/100ms
                  oj     1.545k i/100ms
Calculating -------------------------------------
                json     17.558k (± 3.1%) i/s   (56.95 μs/i) -     89.336k in   5.093146s
          json_coder     17.814k (± 3.3%) i/s   (56.13 μs/i) -     89.148k in   5.009813s
                  oj     15.292k (± 3.6%) i/s   (65.40 μs/i) -     77.250k in   5.058386s

Comparison:
                json:    17558.0 i/s
          json_coder:    17814.3 i/s - same-ish: difference falls within error
                  oj:    15291.5 i/s - 1.15x  slower


== Encoding citm_catalog.json (500298 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json   103.000 i/100ms
          json_coder   107.000 i/100ms
                  oj    88.000 i/100ms
Calculating -------------------------------------
                json      1.023k (±10.7%) i/s  (977.19 μs/i) -      5.047k in   5.046689s
          json_coder      1.068k (± 3.4%) i/s  (936.16 μs/i) -      5.350k in   5.014254s
                  oj    895.747 (± 3.0%) i/s    (1.12 ms/i) -      4.488k in   5.014907s

Comparison:
                json:     1023.3 i/s
          json_coder:     1068.2 i/s - same-ish: difference falls within error
                  oj:      895.7 i/s - same-ish: difference falls within error


== Encoding twitter.json (466906 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json   189.000 i/100ms
          json_coder   185.000 i/100ms
                  oj   178.000 i/100ms
Calculating -------------------------------------
                json      1.952k (± 2.9%) i/s  (512.35 μs/i) -      9.828k in   5.039805s
          json_coder      1.975k (± 2.3%) i/s  (506.32 μs/i) -      9.990k in   5.060905s
                  oj      1.929k (± 2.4%) i/s  (518.51 μs/i) -      9.790k in   5.079165s

Comparison:
                json:     1951.8 i/s
          json_coder:     1975.0 i/s - same-ish: difference falls within error
                  oj:     1928.6 i/s - same-ish: difference falls within error


== Encoding canada.json (2090234 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json     1.000 i/100ms
          json_coder     1.000 i/100ms
                  oj     1.000 i/100ms
Calculating -------------------------------------
                json     10.785 (± 0.0%) i/s   (92.72 ms/i) -     55.000 in   5.111775s
          json_coder     10.845 (± 0.0%) i/s   (92.21 ms/i) -     55.000 in   5.072654s
                  oj     10.705 (± 0.0%) i/s   (93.41 ms/i) -     54.000 in   5.044590s

Comparison:
                json:       10.8 i/s
          json_coder:       10.8 i/s - 1.01x  faster
                  oj:       10.7 i/s - 1.01x  slower


== Encoding many #to_json calls (2701 bytes)
json_coder unsupported (Object not allowed in JSON)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json     2.356k i/100ms
                  oj     1.967k i/100ms
Calculating -------------------------------------
                json     22.902k (± 7.6%) i/s   (43.66 μs/i) -    115.444k in   5.081692s
                  oj     19.756k (± 1.1%) i/s   (50.62 μs/i) -    100.317k in   5.078382s

Comparison:
                json:    22902.2 i/s
                  oj:    19756.4 i/s - 1.16x  slower

@byroot
Copy link
Member

byroot commented Jan 29, 2025

Relative gains on M3 compared to master on the macro benchmarks:

== Encoding activitypub.json (52595 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
               after     2.529k i/100ms
Calculating -------------------------------------
               after     25.594k (± 0.9%) i/s   (39.07 μs/i) -    128.979k in   5.039874s

Comparison:
              before:    22137.7 i/s
               after:    25594.0 i/s - 1.16x  faster


== Encoding citm_catalog.json (500298 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
               after   135.000 i/100ms
Calculating -------------------------------------
               after      1.365k (± 0.4%) i/s  (732.82 μs/i) -      6.885k in   5.045549s

Comparison:
              before:     1371.5 i/s
               after:     1364.6 i/s - same-ish: difference falls within error


== Encoding twitter.json (466906 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
               after   266.000 i/100ms
Calculating -------------------------------------
               after      2.679k (± 0.6%) i/s  (373.22 μs/i) -     13.566k in   5.063340s

Comparison:
              before:     2379.6 i/s
               after:     2679.4 i/s - 1.13x  faster

@byroot
Copy link
Member

byroot commented Jan 29, 2025

This one is interesting, as it doesn't require any of the annoying feature detection SIMD impose.

But if we end up going with SIMD anyway, might as well not bother with this, right?

@samyron
Copy link
Author

samyron commented Jan 29, 2025

This one is interesting, as it doesn't require any of the annoying feature detection SIMD impose.

But if we end up going with SIMD anyway, might as well not bother with this, right?

That is a judgement call. It's nice to have a pure C implementation that doesn't require any special instructions.

If we do go the SIMD route, assuming ARM Neon (Mac m* chips, AWS Graviton (according to Wikipedia)) and x86-64 are the vast majority of CPUs running ruby/json, this is probably unnecessary. However, it's nice to have alternatives.

Edit: This assumes this code is faster on other architectures as well. I have not tested on any other than my M1 and Intel-based Laptop.

@byroot
Copy link
Member

byroot commented Jan 29, 2025

It's nice to have a pure C implementation that doesn't require any special instructions.

True. I guess my only real reservation with this PR (and also with the SIMD ones) is the huge PROCESS_BYTE macro.

I haven't looked too much into it, but I'd really like if such huge macro wasn't necessary. So I need to take some time to experiment with some refactoring.

@byroot
Copy link
Member

byroot commented Jan 29, 2025

I have not tested on any other than my M1 and Intel-based Laptop.

It's likely enough. x86 alone is likely 95% of Ruby usage if not more, we're probably super close to 100% if you add ARM. For other platform correctness is sufficient.

@samyron
Copy link
Author

samyron commented Jan 29, 2025

It's nice to have a pure C implementation that doesn't require any special instructions.

True. I guess my only real reservation with this PR (and also with the SIMD ones) is the huge PROCESS_BYTE macro.

I haven't looked too much into it, but I'd really like if such huge macro wasn't necessary. So I need to take some time to experiment with some refactoring.

It's not necessary. It's the existing conditional. I just didn't want to copy and paste it multiple times.

if (RB_UNLIKELY(ch_len)) {
  switch(ch_len) {
  ...
  }
} else {
  pos++
}

@byroot
Copy link
Member

byroot commented Jan 29, 2025

It's not necessary. It's the existing conditional. I just didn't want to copy and paste it multiple times.

Yes, I mean not having that big macro without copy-pasting either.

What I have in mind right now, but I don't know if it's really possible, would be to move the "search" part in another function, and let it having some state with a stack allocated struct so it can resume. Something very much like https://lemire.me/blog/2024/07/20/scan-html-even-faster-with-simd-instructions-c-and-c/

So the pseudo-code would look like:

scanner_state state = {0};
while (ptr = scan(&state, ptr)) {
  // process one byte
  ptr++;
}

This way all the aligment consideration and such are moved in that scan function, and it would become the natural place where to use SIMD etc.

@byroot
Copy link
Member

byroot commented Jan 29, 2025

NB: I'm not asking you to do this. If you wish to feel free to, but otherwise I want to find some time to try it before I merge this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants