Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming decode #74

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open

Streaming decode #74

wants to merge 8 commits into from

Conversation

zuiderkwast
Copy link

@zuiderkwast zuiderkwast commented Apr 11, 2022

A new option stream:

Decode the input in multiple chunks. Instead of a result or error,
{incomplete, fun()} is returned. The returned fun takes a single argument
and it should called to continue the decoding. When all the input has been
provided, the fun should be called with end_stream or end_json to signal
the end of input and then the fun returns a result or an error.

This is a first working implementation. We have yet to run the benchmark.

I did minimal changes to make it work. Perhaps some refactoring can make it less ugly.

If this makes the core decode implementation slower, we could consider putting the stream decode code in a separate module.

Fixes #73.

@sile
Copy link
Owner

sile commented May 6, 2022

Sorry for the delayed response but what is the status of this PR?
It's still marked as a draft, so are there any TODOs to make it review-ready? (maybe benchmarking?)

@zuiderkwast
Copy link
Author

I think it is ready for review. Only benchmarking is missing. I will mark it ready for review.

Is there a way to run the benchmark of a branch and compare it with master?

@zuiderkwast zuiderkwast marked this pull request as ready for review May 6, 2022 11:30
@sile
Copy link
Owner

sile commented May 7, 2022

I see. Thanks. I starts review of this PR.

Is there a way to run the benchmark of a branch and compare it with master?

There is a benchmark script I used at https://github.com/sile/jsone/tree/master/benchmark/run.sh. But it's not well maintained, so feel free to use another or your own benchmark if you favor that.

@sile
Copy link
Owner

sile commented May 7, 2022

The CI failures could fix if you run $ rebar3 efmt -w to format the source code.

Copy link
Owner

@sile sile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your PR.
I took a look at the diff and leave minor comments.
Although there may be room for code refactoring, your approach seems nice.
Before proceeding with the detailed review, I would like to see the benchmark result.
(If there is no performance penalty we could merge this PR almost as-is. However if a huge performance impact exists, we would need to radically rethink the approach)

src/jsone.erl Outdated Show resolved Hide resolved
src/jsone_decode.erl Outdated Show resolved Hide resolved
src/jsone.erl Outdated Show resolved Hide resolved
@sile
Copy link
Owner

sile commented May 8, 2022

I came up with an idea that it might be possible to implement this feature without modifying the jsone_decode module at all.
The following code shows the ideas.

%% in jsone.erl file
try_decode_stream(Json, Options) ->
  case jsone_decode:decode(Json, Options) of
    {ok, Value, Remainings} ->
      {ok, Value, Remainings};
    %% Add handligs of incomplete cases here
    {error, {badarg, [{jsone_decode, array_next, Args = [<<>>, Values, Nexts, Buf, Opt]}]}} ->
      incomplete(fun jsone_decode:array_next/5, Args);
    %% ... other clauses ...
    {error, Reason} ->
      {error, Reason}
  end.

I'm not 100% sure this approach is actually possible but I think that this has obvious merit that it doesn't introduce any performance overhead when this feature isn't used.

@zuiderkwast
Copy link
Author

That's a very interesting idea. It keeps jsone_decode simple. The case where I'm not sure is when the input is a number split between the digits. In this case, an incomplete input is not an error. E.g. <<"1">>, <<".23">>, <<"e45">>.

I will benchmark with the current implementation first. Then, I might try the badarg-to-incomplete version.

@zuiderkwast
Copy link
Author

Benchmarks of current version and with this PR, decode only.

jsone = current versionjsone = this PR
##### With input Blockchain #####
Name             ips        average  deviation         median         99th %
jiffy         3.61 K      277.26 μs    ±25.91%      248.63 μs      506.51 μs
Jason         2.55 K      392.71 μs    ±10.81%      388.40 μs      524.35 μs
jsone         1.87 K      534.87 μs    ±15.29%      523.80 μs      852.57 μs
Tiny          1.46 K      683.48 μs    ±10.97%      668.14 μs      932.35 μs
Poison        1.37 K      729.87 μs    ±17.99%      706.65 μs     1167.41 μs
JSX           1.24 K      809.64 μs    ±11.73%      796.71 μs     1091.97 μs
JSON          0.46 K     2161.80 μs     ±9.99%     2130.79 μs     2827.24 μs

Comparison:
jiffy 3.61 K
Jason 2.55 K - 1.42x slower +115.45 μs
jsone 1.87 K - 1.93x slower +257.61 μs
Tiny 1.46 K - 2.47x slower +406.22 μs
Poison 1.37 K - 2.63x slower +452.61 μs
JSX 1.24 K - 2.92x slower +532.38 μs
JSON 0.46 K - 7.80x slower +1884.54 μs

With input Giphy

Name ips average deviation median 99th %
jiffy 364.12 2.75 ms ±19.57% 2.66 ms 4.32 ms
Jason 245.39 4.08 ms ±9.34% 4.00 ms 5.71 ms
Tiny 137.24 7.29 ms ±3.41% 7.23 ms 8.23 ms
Poison 123.35 8.11 ms ±14.41% 7.93 ms 15.52 ms
jsone 122.55 8.16 ms ±7.08% 8.04 ms 9.91 ms
JSX 101.85 9.82 ms ±5.67% 9.73 ms 11.53 ms
JSON 52.04 19.22 ms ±4.86% 19.06 ms 22.83 ms

Comparison:
jiffy 364.12
Jason 245.39 - 1.48x slower +1.33 ms
Tiny 137.24 - 2.65x slower +4.54 ms
Poison 123.35 - 2.95x slower +5.36 ms
jsone 122.55 - 2.97x slower +5.41 ms
JSX 101.85 - 3.57x slower +7.07 ms
JSON 52.04 - 7.00x slower +16.47 ms

With input GitHub

Name ips average deviation median 99th %
jiffy 1427.95 0.70 ms ±15.47% 0.68 ms 1.01 ms
Jason 816.06 1.23 ms ±9.34% 1.20 ms 1.76 ms
jsone 575.18 1.74 ms ±14.45% 1.70 ms 2.43 ms
Tiny 535.76 1.87 ms ±6.55% 1.84 ms 2.37 ms
Poison 495.11 2.02 ms ±10.91% 1.99 ms 2.76 ms
JSX 324.99 3.08 ms ±7.06% 3.03 ms 3.76 ms
JSON 183.14 5.46 ms ±5.84% 5.39 ms 6.47 ms

Comparison:
jiffy 1427.95
Jason 816.06 - 1.75x slower +0.53 ms
jsone 575.18 - 2.48x slower +1.04 ms
Tiny 535.76 - 2.67x slower +1.17 ms
Poison 495.11 - 2.88x slower +1.32 ms
JSX 324.99 - 4.39x slower +2.38 ms
JSON 183.14 - 7.80x slower +4.76 ms

With input GovTrack

Name ips average deviation median 99th %
jiffy 9.80 102.08 ms ±9.94% 101.31 ms 136.90 ms
Jason 8.57 116.63 ms ±6.05% 115.18 ms 144.48 ms
jsone 5.09 196.53 ms ±5.87% 195.59 ms 224.68 ms
Tiny 4.31 232.03 ms ±4.59% 232.55 ms 267.62 ms
Poison 3.60 277.70 ms ±9.39% 271.36 ms 401.83 ms
JSX 2.98 336.05 ms ±5.65% 335.72 ms 369.61 ms
JSON 1.13 882.46 ms ±3.27% 871.20 ms 955.92 ms

Comparison:
jiffy 9.80
Jason 8.57 - 1.14x slower +14.55 ms
jsone 5.09 - 1.93x slower +94.45 ms
Tiny 4.31 - 2.27x slower +129.95 ms
Poison 3.60 - 2.72x slower +175.62 ms
JSX 2.98 - 3.29x slower +233.97 ms
JSON 1.13 - 8.64x slower +780.38 ms

With input Issue 90

Name ips average deviation median 99th %
jiffy 35.68 28.03 ms ±6.40% 27.76 ms 32.63 ms
Jason 8.48 117.98 ms ±3.88% 115.47 ms 134.42 ms
Poison 7.98 125.29 ms ±2.47% 124.59 ms 145.71 ms
Tiny 7.29 137.25 ms ±3.03% 135.69 ms 159.29 ms
JSX 6.82 146.73 ms ±3.05% 146.59 ms 166.66 ms
jsone 6.66 150.14 ms ±1.79% 149.39 ms 165.53 ms
JSON 1.34 743.58 ms ±3.69% 736.49 ms 793.92 ms

Comparison:
jiffy 35.68
Jason 8.48 - 4.21x slower +89.96 ms
Poison 7.98 - 4.47x slower +97.26 ms
Tiny 7.29 - 4.90x slower +109.23 ms
JSX 6.82 - 5.24x slower +118.70 ms
jsone 6.66 - 5.36x slower +122.12 ms
JSON 1.34 - 26.53x slower +715.55 ms

With input JSON Generator

Name ips average deviation median 99th %
jiffy 334.43 2.99 ms ±27.92% 2.89 ms 4.82 ms
Jason 328.87 3.04 ms ±5.19% 3.00 ms 3.52 ms
jsone 192.21 5.20 ms ±15.92% 5.03 ms 9.20 ms
Tiny 170.86 5.85 ms ±3.11% 5.83 ms 6.53 ms
Poison 155.75 6.42 ms ±5.09% 6.39 ms 7.35 ms
JSX 128.88 7.76 ms ±4.45% 7.71 ms 8.71 ms
JSON 40.39 24.76 ms ±4.76% 24.70 ms 29.14 ms

Comparison:
jiffy 334.43
Jason 328.87 - 1.02x slower +0.0505 ms
jsone 192.21 - 1.74x slower +2.21 ms
Tiny 170.86 - 1.96x slower +2.86 ms
Poison 155.75 - 2.15x slower +3.43 ms
JSX 128.88 - 2.59x slower +4.77 ms
JSON 40.39 - 8.28x slower +21.77 ms

With input JSON Generator (Pretty)

Name ips average deviation median 99th %
Jason 274.83 3.64 ms ±6.18% 3.58 ms 4.25 ms
jiffy 258.07 3.87 ms ±24.39% 3.64 ms 6.30 ms
jsone 181.26 5.52 ms ±9.94% 5.37 ms 7.81 ms
Tiny 156.34 6.40 ms ±6.45% 6.35 ms 7.09 ms
Poison 145.53 6.87 ms ±8.41% 6.79 ms 7.96 ms
JSX 117.76 8.49 ms ±5.45% 8.41 ms 9.85 ms
JSON 37.86 26.41 ms ±11.13% 25.97 ms 42.08 ms

Comparison:
Jason 274.83
jiffy 258.07 - 1.06x slower +0.24 ms
jsone 181.26 - 1.52x slower +1.88 ms
Tiny 156.34 - 1.76x slower +2.76 ms
Poison 145.53 - 1.89x slower +3.23 ms
JSX 117.76 - 2.33x slower +4.85 ms
JSON 37.86 - 7.26x slower +22.78 ms

With input Pokedex

Name ips average deviation median 99th %
Jason 530.81 1.88 ms ±10.16% 1.85 ms 2.42 ms
jiffy 397.71 2.51 ms ±24.47% 2.28 ms 4.01 ms
jsone 285.58 3.50 ms ±10.17% 3.40 ms 4.95 ms
Poison 217.10 4.61 ms ±3.69% 4.58 ms 5.31 ms
Tiny 206.99 4.83 ms ±7.63% 4.78 ms 5.68 ms
JSX 164.06 6.10 ms ±6.38% 6.00 ms 8.29 ms
JSON 53.07 18.84 ms ±4.63% 18.73 ms 22.39 ms

Comparison:
Jason 530.81
jiffy 397.71 - 1.33x slower +0.63 ms
jsone 285.58 - 1.86x slower +1.62 ms
Poison 217.10 - 2.45x slower +2.72 ms
Tiny 206.99 - 2.56x slower +2.95 ms
JSX 164.06 - 3.24x slower +4.21 ms
JSON 53.07 - 10.00x slower +16.96 ms

With input UTF-8 escaped

Name ips average deviation median 99th %
jiffy 8841.86 0.113 ms ±29.39% 0.109 ms 0.186 ms
Poison 1212.80 0.82 ms ±21.63% 0.76 ms 1.38 ms
Jason 1090.21 0.92 ms ±16.01% 0.92 ms 1.34 ms
Tiny 802.81 1.25 ms ±12.17% 1.25 ms 1.74 ms
jsone 703.02 1.42 ms ±18.66% 1.40 ms 2.24 ms
JSX 683.53 1.46 ms ±16.22% 1.42 ms 2.41 ms
JSON 550.49 1.82 ms ±13.80% 1.89 ms 2.59 ms

Comparison:
jiffy 8841.86
Poison 1212.80 - 7.29x slower +0.71 ms
Jason 1090.21 - 8.11x slower +0.80 ms
Tiny 802.81 - 11.01x slower +1.13 ms
jsone 703.02 - 12.58x slower +1.31 ms
JSX 683.53 - 12.94x slower +1.35 ms
JSON 550.49 - 16.06x slower +1.70 ms

With input UTF-8 unescaped

Name ips average deviation median 99th %
jiffy 13.66 K 73.22 μs ±36.58% 70.58 μs 125.90 μs
Jason 5.36 K 186.41 μs ±15.58% 177.30 μs 310.95 μs
Poison 4.58 K 218.35 μs ±14.81% 204.15 μs 327.11 μs
JSX 3.71 K 269.18 μs ±12.76% 257.38 μs 367.96 μs
jsone 3.24 K 308.71 μs ±21.72% 294.44 μs 686.19 μs
JSON 3.05 K 327.74 μs ±24.76% 308.11 μs 535.51 μs
Tiny 1.95 K 511.66 μs ±14.44% 510.70 μs 698.80 μs

Comparison:
jiffy 13.66 K
Jason 5.36 K - 2.55x slower +113.19 μs
Poison 4.58 K - 2.98x slower +145.12 μs
JSX 3.71 K - 3.68x slower +195.96 μs
jsone 3.24 K - 4.22x slower +235.48 μs
JSON 3.05 K - 4.48x slower +254.52 μs
Tiny 1.95 K - 6.99x slower +438.43 μs

##### With input Blockchain #####
Name             ips        average  deviation         median         99th %
jiffy         4.17 K      240.01 μs    ±23.01%      220.79 μs      380.98 μs
Jason         2.65 K      376.88 μs    ±10.31%      374.17 μs      488.45 μs
jsone         1.74 K      573.40 μs    ±10.16%      566.48 μs      768.11 μs
Poison        1.53 K      655.55 μs    ±12.55%      647.27 μs      920.13 μs
Tiny          1.48 K      677.49 μs    ±10.19%      666.94 μs      890.66 μs
JSX           1.28 K      784.19 μs    ±19.22%      756.03 μs     1599.32 μs
JSON          0.62 K     1618.36 μs     ±6.32%     1605.90 μs     2011.57 μs

Comparison:
jiffy 4.17 K
Jason 2.65 K - 1.57x slower +136.87 μs
jsone 1.74 K - 2.39x slower +333.39 μs
Poison 1.53 K - 2.73x slower +415.54 μs
Tiny 1.48 K - 2.82x slower +437.48 μs
JSX 1.28 K - 3.27x slower +544.18 μs
JSON 0.62 K - 6.74x slower +1378.35 μs

With input Giphy

Name ips average deviation median 99th %
jiffy 411.06 2.43 ms ±23.55% 2.23 ms 4.30 ms
Jason 273.37 3.66 ms ±5.97% 3.65 ms 4.29 ms
Tiny 140.17 7.13 ms ±4.34% 7.07 ms 8.09 ms
Poison 130.80 7.65 ms ±10.39% 7.54 ms 9.31 ms
jsone 119.24 8.39 ms ±13.71% 8.23 ms 13.73 ms
JSX 105.93 9.44 ms ±8.54% 9.32 ms 12.97 ms
JSON 62.33 16.04 ms ±7.67% 15.86 ms 18.90 ms

Comparison:
jiffy 411.06
Jason 273.37 - 1.50x slower +1.23 ms
Tiny 140.17 - 2.93x slower +4.70 ms
Poison 130.80 - 3.14x slower +5.21 ms
jsone 119.24 - 3.45x slower +5.95 ms
JSX 105.93 - 3.88x slower +7.01 ms
JSON 62.33 - 6.59x slower +13.61 ms

With input GitHub

Name ips average deviation median 99th %
jiffy 1444.91 0.69 ms ±17.07% 0.68 ms 1.13 ms
Jason 945.08 1.06 ms ±5.21% 1.05 ms 1.23 ms
Tiny 549.44 1.82 ms ±6.03% 1.80 ms 2.20 ms
jsone 530.12 1.89 ms ±8.04% 1.87 ms 2.48 ms
Poison 508.65 1.97 ms ±8.08% 1.96 ms 2.50 ms
JSX 354.20 2.82 ms ±8.15% 2.79 ms 3.62 ms
JSON 228.48 4.38 ms ±6.75% 4.34 ms 5.15 ms

Comparison:
jiffy 1444.91
Jason 945.08 - 1.53x slower +0.37 ms
Tiny 549.44 - 2.63x slower +1.13 ms
jsone 530.12 - 2.73x slower +1.19 ms
Poison 508.65 - 2.84x slower +1.27 ms
JSX 354.20 - 4.08x slower +2.13 ms
JSON 228.48 - 6.32x slower +3.68 ms

With input GovTrack

Name ips average deviation median 99th %
jiffy 10.22 97.83 ms ±9.66% 97.34 ms 128.47 ms
Jason 9.09 110.01 ms ±5.28% 108.90 ms 132.37 ms
jsone 4.77 209.76 ms ±7.49% 208.33 ms 258.01 ms
Tiny 4.26 234.93 ms ±5.11% 234.71 ms 270.62 ms
Poison 3.87 258.50 ms ±4.69% 257.95 ms 288.59 ms
JSX 3.05 327.43 ms ±4.79% 327.82 ms 366.02 ms
JSON 1.30 771.14 ms ±7.65% 753.15 ms 923.03 ms

Comparison:
jiffy 10.22
Jason 9.09 - 1.12x slower +12.18 ms
jsone 4.77 - 2.14x slower +111.94 ms
Tiny 4.26 - 2.40x slower +137.10 ms
Poison 3.87 - 2.64x slower +160.67 ms
JSX 3.05 - 3.35x slower +229.61 ms
JSON 1.30 - 7.88x slower +673.32 ms

With input Issue 90

Name ips average deviation median 99th %
jiffy 37.05 26.99 ms ±1.95% 26.94 ms 28.45 ms
Jason 9.19 108.86 ms ±0.69% 108.78 ms 111.33 ms
Poison 8.66 115.50 ms ±1.99% 116.18 ms 121.62 ms
Tiny 7.49 133.58 ms ±4.40% 132.69 ms 175.59 ms
JSX 7.10 140.85 ms ±2.34% 140.14 ms 161.88 ms
jsone 5.82 171.69 ms ±4.38% 168.20 ms 191.10 ms
JSON 1.39 721.93 ms ±7.52% 740.32 ms 811.20 ms

Comparison:
jiffy 37.05
Jason 9.19 - 4.03x slower +81.87 ms
Poison 8.66 - 4.28x slower +88.51 ms
Tiny 7.49 - 4.95x slower +106.59 ms
JSX 7.10 - 5.22x slower +113.86 ms
jsone 5.82 - 6.36x slower +144.70 ms
JSON 1.39 - 26.75x slower +694.93 ms

With input JSON Generator

Name ips average deviation median 99th %
jiffy 367.58 2.72 ms ±27.95% 2.61 ms 4.46 ms
Jason 353.64 2.83 ms ±4.67% 2.80 ms 3.28 ms
Tiny 176.42 5.67 ms ±5.46% 5.60 ms 6.55 ms
jsone 161.45 6.19 ms ±21.25% 5.87 ms 12.04 ms
Poison 155.22 6.44 ms ±17.62% 6.21 ms 14.41 ms
JSX 129.81 7.70 ms ±7.34% 7.61 ms 10.12 ms
JSON 45.34 22.06 ms ±4.21% 21.91 ms 25.52 ms

Comparison:
jiffy 367.58
Jason 353.64 - 1.04x slower +0.107 ms
Tiny 176.42 - 2.08x slower +2.95 ms
jsone 161.45 - 2.28x slower +3.47 ms
Poison 155.22 - 2.37x slower +3.72 ms
JSX 129.81 - 2.83x slower +4.98 ms
JSON 45.34 - 8.11x slower +19.34 ms

With input JSON Generator (Pretty)

Name ips average deviation median 99th %
Jason 289.85 3.45 ms ±5.49% 3.41 ms 3.97 ms
jiffy 274.64 3.64 ms ±26.63% 3.40 ms 6.52 ms
Tiny 169.27 5.91 ms ±2.85% 5.89 ms 6.56 ms
Poison 148.18 6.75 ms ±19.28% 6.49 ms 15.94 ms
jsone 145.53 6.87 ms ±24.00% 6.56 ms 15.73 ms
JSX 119.58 8.36 ms ±8.23% 8.23 ms 10.93 ms
JSON 43.42 23.03 ms ±11.39% 22.63 ms 37.01 ms

Comparison:
Jason 289.85
jiffy 274.64 - 1.06x slower +0.191 ms
Tiny 169.27 - 1.71x slower +2.46 ms
Poison 148.18 - 1.96x slower +3.30 ms
jsone 145.53 - 1.99x slower +3.42 ms
JSX 119.58 - 2.42x slower +4.91 ms
JSON 43.42 - 6.68x slower +19.58 ms

With input Pokedex

Name ips average deviation median 99th %
Jason 529.39 1.89 ms ±9.23% 1.87 ms 2.38 ms
jiffy 374.99 2.67 ms ±29.17% 2.45 ms 4.87 ms
jsone 230.71 4.33 ms ±12.27% 4.28 ms 6.57 ms
Tiny 224.30 4.46 ms ±3.69% 4.43 ms 5.08 ms
Poison 220.93 4.53 ms ±10.88% 4.41 ms 5.51 ms
JSX 173.83 5.75 ms ±4.76% 5.70 ms 6.58 ms
JSON 57.09 17.52 ms ±10.26% 17.18 ms 25.97 ms

Comparison:
Jason 529.39
jiffy 374.99 - 1.41x slower +0.78 ms
jsone 230.71 - 2.29x slower +2.45 ms
Tiny 224.30 - 2.36x slower +2.57 ms
Poison 220.93 - 2.40x slower +2.64 ms
JSX 173.83 - 3.05x slower +3.86 ms
JSON 57.09 - 9.27x slower +15.63 ms

With input UTF-8 escaped

Name ips average deviation median 99th %
jiffy 8892.62 0.112 ms ±16.90% 0.110 ms 0.171 ms
Poison 1255.05 0.80 ms ±21.48% 0.73 ms 1.38 ms
Jason 1162.73 0.86 ms ±14.01% 0.88 ms 1.16 ms
Tiny 848.27 1.18 ms ±13.39% 1.19 ms 1.69 ms
jsone 727.05 1.38 ms ±17.55% 1.35 ms 2.17 ms
JSX 616.71 1.62 ms ±18.07% 1.69 ms 2.37 ms
JSON 568.82 1.76 ms ±23.80% 1.71 ms 3.54 ms

Comparison:
jiffy 8892.62
Poison 1255.05 - 7.09x slower +0.68 ms
Jason 1162.73 - 7.65x slower +0.75 ms
Tiny 848.27 - 10.48x slower +1.07 ms
jsone 727.05 - 12.23x slower +1.26 ms
JSX 616.71 - 14.42x slower +1.51 ms
JSON 568.82 - 15.63x slower +1.65 ms

With input UTF-8 unescaped

Name ips average deviation median 99th %
jiffy 13.86 K 72.17 μs ±45.29% 68.61 μs 121.52 μs
Jason 6.09 K 164.28 μs ±16.88% 156.68 μs 288.11 μs
Poison 4.56 K 219.20 μs ±16.17% 204.66 μs 346.56 μs
JSX 3.95 K 253.37 μs ±14.08% 241.00 μs 376.08 μs
JSON 3.15 K 317.15 μs ±27.12% 297.77 μs 569.24 μs
jsone 3.01 K 332.44 μs ±17.28% 321.25 μs 701.15 μs
Tiny 2.10 K 475.99 μs ±14.16% 473.84 μs 622.28 μs

Comparison:
jiffy 13.86 K
Jason 6.09 K - 2.28x slower +92.11 μs
Poison 4.56 K - 3.04x slower +147.03 μs
JSX 3.95 K - 3.51x slower +181.20 μs
JSON 3.15 K - 4.39x slower +244.98 μs
jsone 3.01 K - 4.61x slower +260.27 μs
Tiny 2.10 K - 6.60x slower +403.82 μs

This is done an a laptop. That's why there are big differences between the runs. It is visible that the PR has a slightly negative impact on performance though.

@zuiderkwast
Copy link
Author

zuiderkwast commented May 10, 2022

Note: I did not run rebar3 efmt -w because it causes very many changes, also to code that I didn't touch. It just makes it harder to review. I can do it in a separate commit later.

@sile
Copy link
Owner

sile commented May 10, 2022

Thank you for sharing the benchmark result! It's interesting.

The case where I'm not sure is when the input is a number split between the digits. In this case, an incomplete input is not an error. E.g. <<"1">>, <<".23">>, <<"e45">>.

You're right. It could be a difficult point.

I think that the benchmark result is not too bad, but this change certainly seems to have a negative impact on the decoding performance. So, I'd like to consider the possibility of the above approach further.
(It is undecided whether to do it, but I would like to optimize it so that jsone will be faster someday. Therefore, if possible, I want to avoid performance degradation as much as possible.)

@sile
Copy link
Owner

sile commented May 10, 2022

This is also just an idea, but it might be possible to retry the number decoding as the following:

%% in jsone.erl file (the logic could be complicated, so it feels better to create a new module such as jsone_stream.erl, btw)
try_decode_stream(Json, Options) ->
  case jsone_decode:decode(Json, Options) of
    {ok, Value, Remainings} ->
      {ok, Value, Remainings};
    {error, {badarg, [{jsone_decode, array_next, Args = [<<>>, Values, Nexts, Buf, Opt]}]}} ->
      case Nexts of
          %% If the head element of `Nexts` is a number, retry the number decoding when the next stream input is given.
          [N | Nexts1] when is_number(N) ->
              incomplete(fun jsone_decode:number_integer_part, [jsone:encode(N), Values, Nexts1, Buf Opt]);
          _ ->
              incomplete(fun jsone_decode:array_next/5, Args)
       end;
    %% ... other clauses ...
    {error, Reason} ->
      {error, Reason}
  end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Streaming decode
2 participants