-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streaming decode #74
base: master
Are you sure you want to change the base?
Streaming decode #74
Conversation
Sorry for the delayed response but what is the status of this PR? |
I think it is ready for review. Only benchmarking is missing. I will mark it ready for review. Is there a way to run the benchmark of a branch and compare it with master? |
I see. Thanks. I starts review of this PR.
There is a benchmark script I used at https://github.com/sile/jsone/tree/master/benchmark/run.sh. But it's not well maintained, so feel free to use another or your own benchmark if you favor that. |
The CI failures could fix if you run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your PR.
I took a look at the diff and leave minor comments.
Although there may be room for code refactoring, your approach seems nice.
Before proceeding with the detailed review, I would like to see the benchmark result.
(If there is no performance penalty we could merge this PR almost as-is. However if a huge performance impact exists, we would need to radically rethink the approach)
I came up with an idea that it might be possible to implement this feature without modifying the %% in jsone.erl file
try_decode_stream(Json, Options) ->
case jsone_decode:decode(Json, Options) of
{ok, Value, Remainings} ->
{ok, Value, Remainings};
%% Add handligs of incomplete cases here
{error, {badarg, [{jsone_decode, array_next, Args = [<<>>, Values, Nexts, Buf, Opt]}]}} ->
incomplete(fun jsone_decode:array_next/5, Args);
%% ... other clauses ...
{error, Reason} ->
{error, Reason}
end. I'm not 100% sure this approach is actually possible but I think that this has obvious merit that it doesn't introduce any performance overhead when this feature isn't used. |
That's a very interesting idea. It keeps jsone_decode simple. The case where I'm not sure is when the input is a number split between the digits. In this case, an incomplete input is not an error. E.g. I will benchmark with the current implementation first. Then, I might try the badarg-to-incomplete version. |
Benchmarks of current version and with this PR, decode only.
This is done an a laptop. That's why there are big differences between the runs. It is visible that the PR has a slightly negative impact on performance though. |
Note: I did not run |
Thank you for sharing the benchmark result! It's interesting.
You're right. It could be a difficult point. I think that the benchmark result is not too bad, but this change certainly seems to have a negative impact on the decoding performance. So, I'd like to consider the possibility of the above approach further. |
This is also just an idea, but it might be possible to retry the number decoding as the following: %% in jsone.erl file (the logic could be complicated, so it feels better to create a new module such as jsone_stream.erl, btw)
try_decode_stream(Json, Options) ->
case jsone_decode:decode(Json, Options) of
{ok, Value, Remainings} ->
{ok, Value, Remainings};
{error, {badarg, [{jsone_decode, array_next, Args = [<<>>, Values, Nexts, Buf, Opt]}]}} ->
case Nexts of
%% If the head element of `Nexts` is a number, retry the number decoding when the next stream input is given.
[N | Nexts1] when is_number(N) ->
incomplete(fun jsone_decode:number_integer_part, [jsone:encode(N), Values, Nexts1, Buf Opt]);
_ ->
incomplete(fun jsone_decode:array_next/5, Args)
end;
%% ... other clauses ...
{error, Reason} ->
{error, Reason}
end. |
A new option
stream
:This is a first working implementation. We have yet to run the benchmark.
I did minimal changes to make it work. Perhaps some refactoring can make it less ugly.
If this makes the core decode implementation slower, we could consider putting the stream decode code in a separate module.
Fixes #73.