Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[api] Add support for "pre-encoded keys" ala Jacksons SerializedString #340

Open
rbygrave opened this issue Dec 9, 2021 · 7 comments
Open

Comments

@rbygrave
Copy link

rbygrave commented Dec 9, 2021

Related PR: #344
Related PR in Parsson - eclipse-ee4j/parsson#47

Summary

Create a JsonGenerator.Key

JsonProvider provider = ...;

JsonGenerator.Key firstNameKey = provider.createGeneratorKey("firstName");

Use a JsonGenerator.Key

JsonGenerator generator = ...;

// we can use the key rather than just a string key for performance reasons
// ... it's already escaped 
generator.writeKey(firstNameKey).write("foo");

// rather than string key 
generator.writeKey("lastName").write("foo");

I'd like to discuss adding the ability for JsonGenerator to support "pre-encoded" keys as a performance optimisation along the lines of Jacksons SerializedString .

That is, Jackson JsonGenerator has writeFieldName(SerializableString key) ... and I'd love to see that added to the jakarta json api.

SerializedString is the implementation of SerializableString which has pre-encoded the String value into bytes.

This allows the underlying generators to effectively do array copy of the already encoded bytes.
https://github.com/FasterXML/jackson-core/blob/2.14/src/main/java/com/fasterxml/jackson/core/json/UTF8JsonGenerator.java#L282

Background:

I have been doing JMH performance benchmarks around this area and with jackson core, the use of SerializableString for the "field names" aka "keys" is significant - how significant depends on the relative width of the keys vs values. For relatively wide keys and narrow values it is really significant.

More background, I'm working on avaje-jsonb which uses APT source code generation to do json binding. avaje-jsonb currently abstracts over jackson code and now jakarta json (so I'm doing direct comparison between parsson and jackson-core in this manor). Currently this is an important performance feature that I see with the jackson core adapter.

@rbygrave rbygrave changed the title [api] Add support for "Pre-escaped pre-encoded keys" [api] Add support for "Pre-escaped pre-encoded keys" ala Jacksons SerializedString Dec 9, 2021
@rbygrave rbygrave changed the title [api] Add support for "Pre-escaped pre-encoded keys" ala Jacksons SerializedString [api] Add support for "pre-encoded keys" ala Jacksons SerializedString Dec 9, 2021
@lukasj lukasj transferred this issue from eclipse-ee4j/parsson Dec 9, 2021
@rbygrave
Copy link
Author

The following is a JMH benchmark writing to json comparing:

  1. avaje-jsonb using jackson core 2.12.5
  2. avaje-jsonb using jakarta.json / parrson (org.eclipse.parsson:jakarta.json:1.0.0)
  3. Jackson ObjectMapper 2.12.5
PropertyStrTenTest.toJson_jsonb_jackson                                             thrpt       1714.593          ops/ms
PropertyStrTenTest.toJson_jsonb_jakarta                                             thrpt        992.326          ops/ms
PropertyStrTenTest.toJson_objectMapper                                              thrpt       1602.596          ops/ms

I believe this benchmark is showing me that there are 2 performance techniques that jackson core makes available that are not available via json-p api. These techniques are used by ObjectMapper and also by avaje-jsonb (my library) when it uses jackson core.

The first technique is this issue, that the binding library (jackson objectmapper and avaje-jsonb) can provide a Jackson SerialisedString as the key instead of a plain String.

Adding this could help move the through put of PropertyStrTenTest.toJson_jsonb_jakarta (parrson) towards PropertyStrTenTest.toJson_jsonb_jackson(jackson-core). The comparison with PropertyStrTenTest.toJson_objectMapper helps confirm where this sits relative to Jackson ObjectMapper.

The results with gc profiling:

PropertyStrTenTest.toJson_jsonb_jackson                                             thrpt       1714.593          ops/ms
PropertyStrTenTest.toJson_jsonb_jackson:·gc.alloc.rate                              thrpt       1071.445          MB/sec
PropertyStrTenTest.toJson_jsonb_jackson:·gc.alloc.rate.norm                         thrpt        688.054            B/op
PropertyStrTenTest.toJson_jsonb_jackson:·gc.churn.G1_Eden_Space                     thrpt       1071.163          MB/sec
PropertyStrTenTest.toJson_jsonb_jackson:·gc.churn.G1_Eden_Space.norm                thrpt        687.873            B/op
PropertyStrTenTest.toJson_jsonb_jackson:·gc.churn.G1_Survivor_Space                 thrpt          0.002          MB/sec
PropertyStrTenTest.toJson_jsonb_jackson:·gc.churn.G1_Survivor_Space.norm            thrpt          0.001            B/op
PropertyStrTenTest.toJson_jsonb_jackson:·gc.count                                   thrpt         38.000          counts
PropertyStrTenTest.toJson_jsonb_jackson:·gc.time                                    thrpt         19.000              ms
PropertyStrTenTest.toJson_jsonb_jakarta                                             thrpt        992.326          ops/ms
PropertyStrTenTest.toJson_jsonb_jakarta:·gc.alloc.rate                              thrpt       1391.629          MB/sec
PropertyStrTenTest.toJson_jsonb_jakarta:·gc.alloc.rate.norm                         thrpt       1544.121            B/op
PropertyStrTenTest.toJson_jsonb_jakarta:·gc.churn.G1_Eden_Space                     thrpt       1409.426          MB/sec
PropertyStrTenTest.toJson_jsonb_jakarta:·gc.churn.G1_Eden_Space.norm                thrpt       1563.869            B/op
PropertyStrTenTest.toJson_jsonb_jakarta:·gc.churn.G1_Survivor_Space                 thrpt          0.004          MB/sec
PropertyStrTenTest.toJson_jsonb_jakarta:·gc.churn.G1_Survivor_Space.norm            thrpt          0.004            B/op
PropertyStrTenTest.toJson_jsonb_jakarta:·gc.count                                   thrpt         50.000          counts
PropertyStrTenTest.toJson_jsonb_jakarta:·gc.time                                    thrpt         34.000              ms
PropertyStrTenTest.toJson_objectMapper                                              thrpt       1602.596          ops/ms
PropertyStrTenTest.toJson_objectMapper:·gc.alloc.rate                               thrpt       1013.111          MB/sec
PropertyStrTenTest.toJson_objectMapper:·gc.alloc.rate.norm                          thrpt        696.055            B/op
PropertyStrTenTest.toJson_objectMapper:·gc.churn.G1_Eden_Space                      thrpt       1014.789          MB/sec
PropertyStrTenTest.toJson_objectMapper:·gc.churn.G1_Eden_Space.norm                 thrpt        697.208            B/op
PropertyStrTenTest.toJson_objectMapper:·gc.churn.G1_Survivor_Space                  thrpt          0.002          MB/sec
PropertyStrTenTest.toJson_objectMapper:·gc.churn.G1_Survivor_Space.norm             thrpt          0.001            B/op
PropertyStrTenTest.toJson_objectMapper:·gc.count                                    thrpt         36.000          counts
PropertyStrTenTest.toJson_objectMapper:·gc.time                                     thrpt         18.000              ms

@rbygrave
Copy link
Author

Wider keys -> more benefit

Given the 2 models:

package org.example.jmh.model;

import io.avaje.jsonb.Json;

@Json
public record NarrowNamesRecord(
  String a,
  String b,
  String c,
  String d,
  String e
) {
}
package org.example.jmh.model;

import io.avaje.jsonb.Json;

@Json
public record WideNamesRecord(
  String firstNameProperty1,
  String lastNameProperty2,
  String anotherSimilarProperty3,
  String moreOrLessProperty4,
  String lastButNotLeastProperty5
) {
}

The following benchmark compares the relative throughput when comparing models with relatively narrow keys vs wider keys. We expect the benefit from the use of SerializedString to be more pronounced when the keys are relatively wider as the generator is able to skip more encoding of keys.

Benchmark                                          Mode  Cnt     Score   Error   Units
RecordBasicTest.toJson_narrowNames_jsonb_jackson  thrpt       2327.835          ops/ms
RecordBasicTest.toJson_narrowNames_jsonb_jakarta  thrpt       1425.564          ops/ms
RecordBasicTest.toJson_narrowNames_objectMapper   thrpt       2268.823          ops/ms
RecordBasicTest.toJson_wideNames_jsonb_jackson    thrpt       3585.764          ops/ms
RecordBasicTest.toJson_wideNames_jsonb_jakarta    thrpt       1749.537          ops/ms
RecordBasicTest.toJson_wideNames_objectMapper     thrpt       3069.284          ops/ms

Note the test data for this is:

    testDataWideNames = new WideNamesRecord("1", "2", "3", "4", "5");
    testDataNarrowNames = new NarrowNamesRecord("property1property1property1property1", "property2property2property2property2", "property3property3property3property3", "property4property4property4property4", "property5property5property5property5");

@rbygrave
Copy link
Author

With a local prototype I can see a 17% through put improvement with the "prepared keys" in the wide key case. We see that in the 1800 to 2112 values below.

RecordBasicTest.toJson_wideNames_jsonb_jackson                thrpt       3694.105          ops/ms
RecordBasicTest.toJson_wideNames_jsonb_jakarta_normalKey      thrpt       1800.453          ops/ms
RecordBasicTest.toJson_wideNames_jsonb_jakarta_preparedKey    thrpt       2112.591          ops/ms
RecordBasicTest.toJson_wideNames_objectMapper                 thrpt       3352.952          ops/ms

With the narrow key case we expect very little difference, we want to make sure we don't get degraded performance and we see that:

RecordBasicTest.toJson_narrowNames_jsonb_jackson              thrpt       2420.908          ops/ms
RecordBasicTest.toJson_narrowNames_jsonb_jakarta_normalKey    thrpt       1443.192          ops/ms
RecordBasicTest.toJson_narrowNames_jsonb_jakarta_preparedKey  thrpt       1447.498          ops/ms
RecordBasicTest.toJson_narrowNames_objectMapper               thrpt       2263.234          ops/ms

This is still short of what avaje-jsonb is getting via using jackson-core, and short of jackson objectmapper.

@rbygrave
Copy link
Author

rbygrave commented Dec 13, 2021

With a local prototype, getting closer to the performance of jackson core. Note this has some additional performance adjustments to parsson internals as well (like removing Scope.IN_FIELD to reduce GC pressure).

Benchmark                                                     Mode  Cnt     Score   Error   Units
RecordBasicTest.toJson_wideNames_jsonb_jackson               thrpt       3568.656          ops/ms
RecordBasicTest.toJson_wideNames_jsonb_jakarta_normalKey     thrpt       2334.342          ops/ms
RecordBasicTest.toJson_wideNames_jsonb_jakarta_preparedKey   thrpt       2884.539          ops/ms
RecordBasicTest.toJson_wideNames_jsonb_jakarta_preparedKey2  thrpt       3274.152          ops/ms
RecordBasicTest.toJson_wideNames_objectMapper                thrpt       3551.022          ops/ms

@lukasj
Copy link
Contributor

lukasj commented Dec 13, 2021

hm, I'm a bit confused as where does this belong to. Based on the prefix [api] I would think this belongs here but as you are referring to parsson, that would belong to the implementation. Does this need some change in jsonp-api? If not, I would move this back to parsson project...

@rbygrave
Copy link
Author

Does this need some change in jsonp-api?

Yes it does and yes I think it is fine that the issue is here. Ultimately as I see it we will need to change parsson / the reference implementation as well but it's all good having this here.

I have made this a bit confusing. This is pretty much a performance orientated feature. As such I am looking to prove the concept somewhat before I push a PR for the api change (hence the bench marking etc). So yes, I'm making implementation changes at the moment to parsson so that I can:

  • Push the case for this PR (prove that it has enough performance value to be included)
  • Actually see if I can tweak some internals to match jackson core

What we probably could do at some point is create a related issue in parsson but I am happy to progress my work a little more first.

@rbygrave
Copy link
Author

Ok, I think I've done enough initial bench marking to push for this API change.

With respect to this API change we are looking at comparing the jakarta_normalKey and jakarta_preparedKey throughput. We see better throughput can be achieved by using keys that are "prepared" (already escaped and encoded).

These benchmarks include a baseline "jsonb_jackson" which is avaje-jsonb using Jackson core 12.13.0, and a baseline "objectMapper" which is Jackson ObjectMapper.

Note: Using a modified version of parsson for the jakarta implementation. Changes include using ThreadLocal based buffer recycling


4 threads, 

Benchmark                                                    Mode  Cnt      Score   Error   Units
RecordBasicTest.toJson_wideNames_jsonb_jackson              thrpt    2  12847.776          ops/ms
RecordBasicTest.toJson_wideNames_jsonb_jakarta_normalKey    thrpt    2  12063.079          ops/ms
RecordBasicTest.toJson_wideNames_jsonb_jakarta_preparedKey  thrpt    2  18269.082          ops/ms
RecordBasicTest.toJson_wideNames_objectMapper               thrpt    2  11494.361          ops/ms


2 threads

Benchmark                                                    Mode  Cnt      Score      Error   Units
RecordBasicTest.toJson_wideNames_jsonb_jackson              thrpt    4   6683.975 ± 3575.546  ops/ms
RecordBasicTest.toJson_wideNames_jsonb_jakarta_normalKey    thrpt    4   6882.481 ±  827.704  ops/ms
RecordBasicTest.toJson_wideNames_jsonb_jakarta_preparedKey  thrpt    4  10105.276 ± 1312.702  ops/ms
RecordBasicTest.toJson_wideNames_objectMapper               thrpt    4   6577.431 ±  784.614  ops/ms

1 thread

Benchmark                                                    Mode  Cnt     Score     Error   Units
RecordBasicTest.toJson_wideNames_jsonb_jackson              thrpt    4  3608.207 ± 525.817  ops/ms
RecordBasicTest.toJson_wideNames_jsonb_jakarta_normalKey    thrpt    4  3471.075 ± 396.187  ops/ms
RecordBasicTest.toJson_wideNames_jsonb_jakarta_preparedKey  thrpt    4  5248.490 ± 682.141  ops/ms
RecordBasicTest.toJson_wideNames_objectMapper               thrpt    4  3261.331 ± 272.124  ops/ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants