Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implements patch list as a single contiguous array of reusable PatchPoint instances. #521

Merged
merged 4 commits into from
Nov 7, 2023

Conversation

linlin-s
Copy link
Contributor

@linlin-s linlin-s commented Jul 5, 2023

Issue #, if available:
N/A

Description of changes:
This PR introduces optimizations to improve memory usage and avoid unnecessary allocations. The following changes are included in this PR:

  1. Instead of maintaining the patch list for both the container level and IonRawBinaryWriter, we only keep the IonRawBinaryWriter level patch list (this patch list is composed by the patch list for containers within the current scope)during the writing process. Previously, we needed to append the child container's patch list after the parent container to maintain the accurate sequence of the patch points. With the change, we use a patch point placeholder for the parent container's patch information. If there is a patch point created for the current container, we override the placeholder with the real data.
  2. Instead of using a linked list, we implemented a recycling queue to manage the patch points. With the recycling queue, we can reuse the initialized instances to avoid memory allocations when more patch points are required during the writing process.

Here are the performance comparison results before and after the change: Benchmark a write of data equivalent to the a stream of stream of 194617 nested binary data using IonWriter(binary). The output data will write into an in-memory buffer.
Note: This implementation is built on top of removing the patch buffer, so we will compare the performance of the library without any changes, with the change of removing the patch buffer and with the change of recycling queue implementation.

Preallocation 0:

No Change:

Benchmark Score Error Units
Bench.run 4455.681 ± 46.946 ms/op
Bench.run:Heap usage 3057.83 ± 141.196 MB
Bench.run:Serialized size 201.663 MB
Bench.run:·gc.alloc.rate 192.774 ± 2.753 MB/sec
Bench.run:·gc.alloc.rate.norm 935822357 ± 9234722.854 B/op
Bench.run:·gc.churn.G1_Eden_Space 81.197 ± 2.841 MB/sec
Bench.run:·gc.churn.G1_Eden_Space.norm 394264576 ± 13591199.851 B/op
Bench.run:·gc.churn.G1_Old_Gen 44.673 ± 6.120 MB/sec
Bench.run:·gc.churn.G1_Old_Gen.norm 216132485 ± 28801047.915 B/op
Bench.run:·gc.churn.G1_Survivor_Space 1.41 ± 1.228 MB/sec
Bench.run:·gc.churn.G1_Survivor_Space.norm 6912296.75 ± 6001319.388 B/op
Bench.run:·gc.count 132 counts
Bench.run:·gc.time 47874 ms

Patch Buffer Removal:

Benchmark Score Error Units
Bench.run 4353.683 ± 54.590 ms/op
Bench.run:Heap usage 3079.419 ± 183.491 MB
Bench.run:Serialized size 201.663 MB
Bench.run:·gc.alloc.rate 192.337 ± 2.324 MB/sec
Bench.run:·gc.alloc.rate.norm 912969966 ± 7548262.441 B/op
Bench.run:·gc.churn.G1_Eden_Space 75.176 ± 4.809 MB/sec
Bench.run:·gc.churn.G1_Eden_Space.norm 356515840 ± 21789150.616 B/op
Bench.run:·gc.churn.G1_Old_Gen 45.859 ± 9.148 MB/sec
Bench.run:·gc.churn.G1_Old_Gen.norm 215908789 ± 41615974.370 B/op
Bench.run:·gc.churn.G1_Survivor_Space 1.59 ± 1.329 MB/sec
Bench.run:·gc.churn.G1_Survivor_Space.norm 7668585.81 ± 6380234.445 B/op
Bench.run:·gc.count 117 counts
Bench.run:·gc.time 44103 ms

Recycling Queue:

Benchmark Score Error Units
Bench.run 4395.697 ± 24.232 ms/op
Bench.run:Heap usage 3300.885 ± 140.748 MB
Bench.run:Serialized size 201.663   MB
Bench.run:·gc.alloc.rate 177.847 ± 1.499 MB/sec
Bench.run:·gc.alloc.rate.norm 851758953 ± 7563247.530 B/op
Bench.run:·gc.churn.G1_Eden_Space 48.045 ± 4.109 MB/sec
Bench.run:·gc.churn.G1_Eden_Space.norm 230071555 ± 19649371.957 B/op
Bench.run:·gc.churn.G1_Old_Gen 29.546 ± 5.450 MB/sec
Bench.run:·gc.churn.G1_Old_Gen.norm 141305409 ± 25770652.675 B/op
Bench.run:·gc.churn.G1_Survivor_Space 2.169 ± 0.972 MB/sec
Bench.run:·gc.churn.G1_Survivor_Space.norm 10379929.3 ± 4658612.210 B/op
Bench.run:·gc.count 154   counts
Bench.run:·gc.time 13536   ms

Preallocation 1:

No change:

Benchmark Score Error Units
Bench.run 3991.541 ± 32.892 ms/op
Bench.run:Heap usage 3002.056 ± 213.998 MB
Bench.run:Serialized size 201.663 MB
Bench.run:·gc.alloc.rate 163.059 ± 2.099 MB/sec
Bench.run:·gc.alloc.rate.norm 713031570 ± 9233012.903 B/op
Bench.run:·gc.churn.G1_Eden_Space 34.671 ± 3.112 MB/sec
Bench.run:·gc.churn.G1_Eden_Space.norm 151596128 ± 13636978.802 B/op
Bench.run:·gc.churn.G1_Old_Gen 54.275 ± 13.137 MB/sec
Bench.run:·gc.churn.G1_Old_Gen.norm 236207049 ± 55791489.905 B/op
Bench.run:·gc.churn.G1_Survivor_Space 2 ± 1.557 MB/sec
Bench.run:·gc.churn.G1_Survivor_Space.norm 8884933.97 ± 6974512.006 B/op
Bench.run:·gc.count 122 counts
Bench.run:·gc.time 5626 ms

Patch Buffer Removal:

Benchmark Score Error Units
Bench.run 3986.978 ± 49.577 ms/op
Bench.run:Heap usage 3133.488 ± 205.444 MB
Bench.run:Serialized size 201.663 MB
Bench.run:·gc.alloc.rate 161.223 ± 1.708 MB/sec
Bench.run:·gc.alloc.rate.norm 702950841 ± 9234603.357 B/op
Bench.run:·gc.churn.G1_Eden_Space 29.084 ± 3.111 MB/sec
Bench.run:·gc.churn.G1_Eden_Space.norm 126723905 ± 13264318.356 B/op
Bench.run:·gc.churn.G1_Old_Gen 49.225 ± 12.427 MB/sec
Bench.run:·gc.churn.G1_Old_Gen.norm 213226636 ± 52205386.688 B/op
Bench.run:·gc.churn.G1_Survivor_Space 1.789 ± 1.172 MB/sec
Bench.run:·gc.churn.G1_Survivor_Space.norm 7797404.43 ± 5079736.242 B/op
Bench.run:·gc.count 105 counts
Bench.run:·gc.time 4323 ms

Recycling Queue:

Benchmark Score Error Units
Bench.run 4246.027 ± 32.952 ms/op
Bench.run:Heap usage 2625.181 ± 74.201 MB
Bench.run:Serialized size 201.663   MB
Bench.run:·gc.alloc.rate 152.277 ± 1.611 MB/sec
Bench.run:·gc.alloc.rate.norm 706452187 ± 9242052.284 B/op
Bench.run:·gc.churn.G1_Eden_Space 35.609 ± 1.482 MB/sec
Bench.run:·gc.churn.G1_Eden_Space.norm 165325483 ± 7658126.924 B/op
Bench.run:·gc.churn.G1_Old_Gen 76.571 ± 4.320 MB/sec
Bench.run:·gc.churn.G1_Old_Gen.norm 355100037 ± 19492617.525 B/op
Bench.run:·gc.churn.G1_Survivor_Space 1.766 ± 2.022 MB/sec
Bench.run:·gc.churn.G1_Survivor_Space.norm 8171577.81 ± 9352188.561 B/op
Bench.run:·gc.count 135   counts
Bench.run:·gc.time 1978   ms

Preallocation 2:

No Change:

Benchmark Score Error Units
Bench.run 3991.828 ± 25.325 ms/op
Bench.run:Heap usage 2587.51 ± 105.583 MB
Bench.run:Serialized size 204.76 MB
Bench.run:·gc.alloc.rate 161.335 ± 2.048 MB/sec
Bench.run:·gc.alloc.rate.norm 705458559 ± 7538111.558 B/op
Bench.run:·gc.churn.G1_Eden_Space 38.321 ± 2.435 MB/sec
Bench.run:·gc.churn.G1_Eden_Space.norm 167660312 ± 10798782.611 B/op
Bench.run:·gc.churn.G1_Old_Gen 82.543 ± 6.432 MB/sec
Bench.run:·gc.churn.G1_Old_Gen.norm 361075125 ± 27841404.073 B/op
Bench.run:·gc.churn.G1_Survivor_Space 1.282 ± 2.294 MB/sec
Bench.run:·gc.churn.G1_Survivor_Space.norm 5631533.01 ± 10088275.751 B/op
Bench.run:·gc.count 144 counts
Bench.run:·gc.time 1562 ms

Patch Buffer Removal:

Benchmark Score Error Units
Bench.run 3932.218 ± 35.715 ms/op
Bench.run:Heap usage 2641.364 ± 99.636 MB
Bench.run:Serialized size 204.76 MB
Bench.run:·gc.alloc.rate 163.452 ± 2.999 MB/sec
Bench.run:·gc.alloc.rate.norm 705397751 ± 7537218.072 B/op
Bench.run:·gc.churn.G1_Eden_Space 38.229 ± 2.882 MB/sec
Bench.run:·gc.churn.G1_Eden_Space.norm 164947995 ± 12079327.544 B/op
Bench.run:·gc.churn.G1_Old_Gen 79.123 ± 7.516 MB/sec
Bench.run:·gc.churn.G1_Old_Gen.norm 341289885 ± 31607423.749 B/op
Bench.run:·gc.churn.G1_Survivor_Space 1.836 ± 1.872 MB/sec
Bench.run:·gc.churn.G1_Survivor_Space.norm 7931019.31 ± 8002315.605 B/op
Bench.run:·gc.count 146 counts
Bench.run:·gc.time 1502 ms

Recycling Queue:

Benchmark Score Error Units
Bench.run 4198.981 ± 22.085 ms/op
Bench.run:Heap usage 2656.371 ± 131.358 MB
Bench.run:Serialized size 204.76   MB
Bench.run:·gc.alloc.rate 153.557 ± 1.924 MB/sec
Bench.run:·gc.alloc.rate.norm 705398114 ± 7536738.205 B/op
Bench.run:·gc.churn.G1_Eden_Space 36.111 ± 3.047 MB/sec
Bench.run:·gc.churn.G1_Eden_Space.norm 165940647 ± 14012794.045 B/op
Bench.run:·gc.churn.G1_Old_Gen 74.461 ± 8.138 MB/sec
Bench.run:·gc.churn.G1_Old_Gen.norm 342194582 ± 37304626.469 B/op
Bench.run:·gc.churn.G1_Survivor_Space 1.298 ± 1.344 MB/sec
Bench.run:·gc.churn.G1_Survivor_Space.norm 5963441.28 ± 6187532.961 B/op
Bench.run:·gc.count 151   counts
Bench.run:·gc.time 1618   ms

For more benchmark results generated from other dataset, please visit here.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@linlin-s linlin-s marked this pull request as ready for review July 7, 2023 22:47
@linlin-s
Copy link
Contributor Author

linlin-s commented Jul 21, 2023

Here are more benchmark results from a different usage pattern:

Results from benchmarking a write of data equivalent to a stream of 59155 nested binary Ion values(3 forks, 2 warmups, 2 iterations, preallocation 1).

Benchmark No Change Remove Patch Buffer Optimize Patch List Units
Bench.run 509.652 503.944 496.619 ms/op
Bench.run:Heap usage 326.793 355.376 384.838 MB
Bench.run:Serialized size 21.271 21.271 21.271 MB
Bench.run:·gc.alloc.rate 125.438 126.443 128.845 MB/sec
Bench.run:·gc.alloc.rate.norm 70369781.67 70038249.64 70368735.18 B/op
Bench.run:·gc.churn.G1_Eden_Space 7.477 5.494 7.435 MB/sec
Bench.run:·gc.churn.G1_Eden_Space.norm 4194304 3043367.01 4061151.492 B/op
Bench.run:·gc.churn.G1_Old_Gen 133.457 132.904 132.499 MB/sec
Bench.run:·gc.churn.G1_Old_Gen.norm 74868326.4 73639994.51 72368388.06 B/op
Bench.run:·gc.churn.G1_Survivor_Space 0.03 0.029 0.054 MB/sec
Bench.run:·gc.churn.G1_Survivor_Space.norm 16604.8 16085.956 29435.111 B/op
Bench.run:·gc.count 78 73 77 counts
Bench.run:·gc.time 585 518 567 ms

Results from benchmarking a write of data equivalent to a single nested binary Ion value(3 forks, 2 warmups, 2 iterations, preallocation 1).

Benchmark No Change Remove Buffer Optimize Patch List Units
Bench.run 0.089 0.082 0.089 ms/op
Bench.run:Heap usage 6.778 8.811 7.126 MB
Bench.run:Serialized size 0.005 0.005 0.005 MB
Bench.run:·gc.alloc.rate 93.704 96.721 93.916 MB/sec
Bench.run:·gc.alloc.rate.norm 9172.312 8701.888 9172.299 B/op
Bench.run:·gc.churn.G1_Eden_Space 94.339 97.196 94.465 MB/sec
Bench.run:·gc.churn.G1_Eden_Space.norm 9234.427 8745.584 9225.896 B/op
Bench.run:·gc.churn.G1_Survivor_Space 0.004 0.004 0.004 MB/sec
Bench.run:·gc.churn.G1_Survivor_Space.norm 0.387 0.317 0.421 B/op
Bench.run:·gc.count 574 592 575 counts
Bench.run:·gc.time 317 309 303 ms

Results from benchmarking a write of data equivalent to a stream of 194617 nested binary Ion value(3 forks, 2 warmups, 2 iterations, preallocation 1).

Benchmark No Change Remove Patch Buffer Optimize Patch List Units
Bench.run 3988.343 3974.707 4031.55 ms/op
Bench.run:Heap usage 2815.305 2654.546 2686.608 MB
Bench.run:Serialized size 201.663 201.663 201.663 MB
Bench.run:·gc.alloc.rate 160.884 160.583 159.961 MB/sec
Bench.run:·gc.alloc.rate.norm 703010831.1 713050404 715554289.3 B/op
Bench.run:·gc.churn.G1_Eden_Space 32.785 40.088 37.906 MB/sec
Bench.run:·gc.churn.G1_Eden_Space.norm 143072369.8 178374428.4 169869312 B/op
Bench.run:·gc.churn.G1_Old_Gen 58.898 73.473 74.752 MB/sec
Bench.run:·gc.churn.G1_Old_Gen.norm 255793294.2 326050275.6 334445226.7 B/op
Bench.run:·gc.churn.G1_Survivor_Space 6.423 4.629 1.77 MB/sec
Bench.run:·gc.churn.G1_Survivor_Space.norm 28208808 20096732 7976842.667 B/op
Bench.run:·gc.count 16 20 15 counts
Bench.run:·gc.time 571 788 663 ms

@linlin-s linlin-s closed this Jul 28, 2023
@linlin-s linlin-s reopened this Jul 28, 2023
Base automatically changed from remove-buffer to master November 6, 2023 22:48
Copy link

codecov bot commented Nov 7, 2023

Codecov Report

Attention: 19 lines in your changes are missing coverage. Please review.

Files Coverage Δ
...c/com/amazon/ion/impl/_Private_RecyclingQueue.java 89.28% <89.28%> (ø)
...rc/com/amazon/ion/impl/bin/IonRawBinaryWriter.java 92.06% <84.09%> (+1.82%) ⬆️
...c/com/amazon/ion/impl/_Private_RecyclingStack.java 75.60% <55.00%> (-19.85%) ⬇️

... and 1 file with indirect coverage changes

📢 Thoughts on this report? Let us know!

Copy link
Contributor

@tgregg tgregg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be interesting to do a performance comparison with the recycling queue both with and without auto-flush. I wonder if auto-flush amplifies the benefits of this change by limiting the size of the recycling queue.

@linlin-s
Copy link
Contributor Author

linlin-s commented Nov 7, 2023

I think it would be interesting to do a performance comparison with the recycling queue both with and without auto-flush. I wonder if auto-flush amplifies the benefits of this change by limiting the size of the recycling queue.

The flush operation did help reuse the objects in the recycling queue. From the benchmark profiling results, we observed 6% size decrease on patchPoints which is implemented by recycling queue.
No flush:
image

Flush:
image

@linlin-s linlin-s merged commit 67e38c4 into master Nov 7, 2023
17 of 34 checks passed
@tgregg tgregg deleted the update-patches branch November 7, 2023 23:31
linlin-s added a commit that referenced this pull request May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants