Implements patch list as a single contiguous array of reusable PatchPoint instances. #521

linlin-s · 2023-07-05T15:59:26Z

Issue #, if available:
N/A

Description of changes:
This PR introduces optimizations to improve memory usage and avoid unnecessary allocations. The following changes are included in this PR:

Instead of maintaining the patch list for both the container level and IonRawBinaryWriter, we only keep the IonRawBinaryWriter level patch list (this patch list is composed by the patch list for containers within the current scope)during the writing process. Previously, we needed to append the child container's patch list after the parent container to maintain the accurate sequence of the patch points. With the change, we use a patch point placeholder for the parent container's patch information. If there is a patch point created for the current container, we override the placeholder with the real data.
Instead of using a linked list, we implemented a recycling queue to manage the patch points. With the recycling queue, we can reuse the initialized instances to avoid memory allocations when more patch points are required during the writing process.

Here are the performance comparison results before and after the change: Benchmark a write of data equivalent to the a stream of stream of 194617 nested binary data using IonWriter(binary). The output data will write into an in-memory buffer.
Note: This implementation is built on top of removing the patch buffer, so we will compare the performance of the library without any changes, with the change of removing the patch buffer and with the change of recycling queue implementation.

Preallocation 0:

No Change:

Benchmark	Score	Error	Units
Bench.run	4455.681	± 46.946	ms/op
Bench.run:Heap usage	3057.83	± 141.196	MB
Bench.run:Serialized size	201.663		MB
Bench.run:·gc.alloc.rate	192.774	± 2.753	MB/sec
Bench.run:·gc.alloc.rate.norm	935822357	± 9234722.854	B/op
Bench.run:·gc.churn.G1_Eden_Space	81.197	± 2.841	MB/sec
Bench.run:·gc.churn.G1_Eden_Space.norm	394264576	± 13591199.851	B/op
Bench.run:·gc.churn.G1_Old_Gen	44.673	± 6.120	MB/sec
Bench.run:·gc.churn.G1_Old_Gen.norm	216132485	± 28801047.915	B/op
Bench.run:·gc.churn.G1_Survivor_Space	1.41	± 1.228	MB/sec
Bench.run:·gc.churn.G1_Survivor_Space.norm	6912296.75	± 6001319.388	B/op
Bench.run:·gc.count	132		counts
Bench.run:·gc.time	47874		ms

Patch Buffer Removal:

Benchmark	Score	Error	Units
Bench.run	4353.683	± 54.590	ms/op
Bench.run:Heap usage	3079.419	± 183.491	MB
Bench.run:Serialized size	201.663		MB
Bench.run:·gc.alloc.rate	192.337	± 2.324	MB/sec
Bench.run:·gc.alloc.rate.norm	912969966	± 7548262.441	B/op
Bench.run:·gc.churn.G1_Eden_Space	75.176	± 4.809	MB/sec
Bench.run:·gc.churn.G1_Eden_Space.norm	356515840	± 21789150.616	B/op
Bench.run:·gc.churn.G1_Old_Gen	45.859	± 9.148	MB/sec
Bench.run:·gc.churn.G1_Old_Gen.norm	215908789	± 41615974.370	B/op
Bench.run:·gc.churn.G1_Survivor_Space	1.59	± 1.329	MB/sec
Bench.run:·gc.churn.G1_Survivor_Space.norm	7668585.81	± 6380234.445	B/op
Bench.run:·gc.count	117		counts
Bench.run:·gc.time	44103		ms

Recycling Queue:

Benchmark	Score	Error	Units
Bench.run	4395.697	± 24.232	ms/op
Bench.run:Heap usage	3300.885	± 140.748	MB
Bench.run:Serialized size	201.663		MB
Bench.run:·gc.alloc.rate	177.847	± 1.499	MB/sec
Bench.run:·gc.alloc.rate.norm	851758953	± 7563247.530	B/op
Bench.run:·gc.churn.G1_Eden_Space	48.045	± 4.109	MB/sec
Bench.run:·gc.churn.G1_Eden_Space.norm	230071555	± 19649371.957	B/op
Bench.run:·gc.churn.G1_Old_Gen	29.546	± 5.450	MB/sec
Bench.run:·gc.churn.G1_Old_Gen.norm	141305409	± 25770652.675	B/op
Bench.run:·gc.churn.G1_Survivor_Space	2.169	± 0.972	MB/sec
Bench.run:·gc.churn.G1_Survivor_Space.norm	10379929.3	± 4658612.210	B/op
Bench.run:·gc.count	154		counts
Bench.run:·gc.time	13536		ms

Preallocation 1:

No change:

Benchmark	Score	Error	Units
Bench.run	3991.541	± 32.892	ms/op
Bench.run:Heap usage	3002.056	± 213.998	MB
Bench.run:Serialized size	201.663		MB
Bench.run:·gc.alloc.rate	163.059	± 2.099	MB/sec
Bench.run:·gc.alloc.rate.norm	713031570	± 9233012.903	B/op
Bench.run:·gc.churn.G1_Eden_Space	34.671	± 3.112	MB/sec
Bench.run:·gc.churn.G1_Eden_Space.norm	151596128	± 13636978.802	B/op
Bench.run:·gc.churn.G1_Old_Gen	54.275	± 13.137	MB/sec
Bench.run:·gc.churn.G1_Old_Gen.norm	236207049	± 55791489.905	B/op
Bench.run:·gc.churn.G1_Survivor_Space	2	± 1.557	MB/sec
Bench.run:·gc.churn.G1_Survivor_Space.norm	8884933.97	± 6974512.006	B/op
Bench.run:·gc.count	122		counts
Bench.run:·gc.time	5626		ms

Patch Buffer Removal:

Benchmark	Score	Error	Units
Bench.run	3986.978	± 49.577	ms/op
Bench.run:Heap usage	3133.488	± 205.444	MB
Bench.run:Serialized size	201.663		MB
Bench.run:·gc.alloc.rate	161.223	± 1.708	MB/sec
Bench.run:·gc.alloc.rate.norm	702950841	± 9234603.357	B/op
Bench.run:·gc.churn.G1_Eden_Space	29.084	± 3.111	MB/sec
Bench.run:·gc.churn.G1_Eden_Space.norm	126723905	± 13264318.356	B/op
Bench.run:·gc.churn.G1_Old_Gen	49.225	± 12.427	MB/sec
Bench.run:·gc.churn.G1_Old_Gen.norm	213226636	± 52205386.688	B/op
Bench.run:·gc.churn.G1_Survivor_Space	1.789	± 1.172	MB/sec
Bench.run:·gc.churn.G1_Survivor_Space.norm	7797404.43	± 5079736.242	B/op
Bench.run:·gc.count	105		counts
Bench.run:·gc.time	4323		ms

Recycling Queue:

Benchmark	Score	Error	Units
Bench.run	4246.027	± 32.952	ms/op
Bench.run:Heap usage	2625.181	± 74.201	MB
Bench.run:Serialized size	201.663		MB
Bench.run:·gc.alloc.rate	152.277	± 1.611	MB/sec
Bench.run:·gc.alloc.rate.norm	706452187	± 9242052.284	B/op
Bench.run:·gc.churn.G1_Eden_Space	35.609	± 1.482	MB/sec
Bench.run:·gc.churn.G1_Eden_Space.norm	165325483	± 7658126.924	B/op
Bench.run:·gc.churn.G1_Old_Gen	76.571	± 4.320	MB/sec
Bench.run:·gc.churn.G1_Old_Gen.norm	355100037	± 19492617.525	B/op
Bench.run:·gc.churn.G1_Survivor_Space	1.766	± 2.022	MB/sec
Bench.run:·gc.churn.G1_Survivor_Space.norm	8171577.81	± 9352188.561	B/op
Bench.run:·gc.count	135		counts
Bench.run:·gc.time	1978		ms

Preallocation 2:

No Change:

Benchmark	Score	Error	Units
Bench.run	3991.828	± 25.325	ms/op
Bench.run:Heap usage	2587.51	± 105.583	MB
Bench.run:Serialized size	204.76		MB
Bench.run:·gc.alloc.rate	161.335	± 2.048	MB/sec
Bench.run:·gc.alloc.rate.norm	705458559	± 7538111.558	B/op
Bench.run:·gc.churn.G1_Eden_Space	38.321	± 2.435	MB/sec
Bench.run:·gc.churn.G1_Eden_Space.norm	167660312	± 10798782.611	B/op
Bench.run:·gc.churn.G1_Old_Gen	82.543	± 6.432	MB/sec
Bench.run:·gc.churn.G1_Old_Gen.norm	361075125	± 27841404.073	B/op
Bench.run:·gc.churn.G1_Survivor_Space	1.282	± 2.294	MB/sec
Bench.run:·gc.churn.G1_Survivor_Space.norm	5631533.01	± 10088275.751	B/op
Bench.run:·gc.count	144		counts
Bench.run:·gc.time	1562		ms

Patch Buffer Removal:

Benchmark	Score	Error	Units
Bench.run	3932.218	± 35.715	ms/op
Bench.run:Heap usage	2641.364	± 99.636	MB
Bench.run:Serialized size	204.76		MB
Bench.run:·gc.alloc.rate	163.452	± 2.999	MB/sec
Bench.run:·gc.alloc.rate.norm	705397751	± 7537218.072	B/op
Bench.run:·gc.churn.G1_Eden_Space	38.229	± 2.882	MB/sec
Bench.run:·gc.churn.G1_Eden_Space.norm	164947995	± 12079327.544	B/op
Bench.run:·gc.churn.G1_Old_Gen	79.123	± 7.516	MB/sec
Bench.run:·gc.churn.G1_Old_Gen.norm	341289885	± 31607423.749	B/op
Bench.run:·gc.churn.G1_Survivor_Space	1.836	± 1.872	MB/sec
Bench.run:·gc.churn.G1_Survivor_Space.norm	7931019.31	± 8002315.605	B/op
Bench.run:·gc.count	146		counts
Bench.run:·gc.time	1502		ms

Recycling Queue:

Benchmark	Score	Error	Units
Bench.run	4198.981	± 22.085	ms/op
Bench.run:Heap usage	2656.371	± 131.358	MB
Bench.run:Serialized size	204.76		MB
Bench.run:·gc.alloc.rate	153.557	± 1.924	MB/sec
Bench.run:·gc.alloc.rate.norm	705398114	± 7536738.205	B/op
Bench.run:·gc.churn.G1_Eden_Space	36.111	± 3.047	MB/sec
Bench.run:·gc.churn.G1_Eden_Space.norm	165940647	± 14012794.045	B/op
Bench.run:·gc.churn.G1_Old_Gen	74.461	± 8.138	MB/sec
Bench.run:·gc.churn.G1_Old_Gen.norm	342194582	± 37304626.469	B/op
Bench.run:·gc.churn.G1_Survivor_Space	1.298	± 1.344	MB/sec
Bench.run:·gc.churn.G1_Survivor_Space.norm	5963441.28	± 6187532.961	B/op
Bench.run:·gc.count	151		counts
Bench.run:·gc.time	1618		ms

For more benchmark results generated from other dataset, please visit here.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

linlin-s · 2023-07-21T16:15:06Z

Here are more benchmark results from a different usage pattern:

Results from benchmarking a write of data equivalent to a stream of 59155 nested binary Ion values(3 forks, 2 warmups, 2 iterations, preallocation 1).

Benchmark	No Change	Remove Patch Buffer	Optimize Patch List	Units
Bench.run	509.652	503.944	496.619	ms/op
Bench.run:Heap usage	326.793	355.376	384.838	MB
Bench.run:Serialized size	21.271	21.271	21.271	MB
Bench.run:·gc.alloc.rate	125.438	126.443	128.845	MB/sec
Bench.run:·gc.alloc.rate.norm	70369781.67	70038249.64	70368735.18	B/op
Bench.run:·gc.churn.G1_Eden_Space	7.477	5.494	7.435	MB/sec
Bench.run:·gc.churn.G1_Eden_Space.norm	4194304	3043367.01	4061151.492	B/op
Bench.run:·gc.churn.G1_Old_Gen	133.457	132.904	132.499	MB/sec
Bench.run:·gc.churn.G1_Old_Gen.norm	74868326.4	73639994.51	72368388.06	B/op
Bench.run:·gc.churn.G1_Survivor_Space	0.03	0.029	0.054	MB/sec
Bench.run:·gc.churn.G1_Survivor_Space.norm	16604.8	16085.956	29435.111	B/op
Bench.run:·gc.count	78	73	77	counts
Bench.run:·gc.time	585	518	567	ms

Results from benchmarking a write of data equivalent to a single nested binary Ion value(3 forks, 2 warmups, 2 iterations, preallocation 1).

Benchmark	No Change	Remove Buffer	Optimize Patch List	Units
Bench.run	0.089	0.082	0.089	ms/op
Bench.run:Heap usage	6.778	8.811	7.126	MB
Bench.run:Serialized size	0.005	0.005	0.005	MB
Bench.run:·gc.alloc.rate	93.704	96.721	93.916	MB/sec
Bench.run:·gc.alloc.rate.norm	9172.312	8701.888	9172.299	B/op
Bench.run:·gc.churn.G1_Eden_Space	94.339	97.196	94.465	MB/sec
Bench.run:·gc.churn.G1_Eden_Space.norm	9234.427	8745.584	9225.896	B/op
Bench.run:·gc.churn.G1_Survivor_Space	0.004	0.004	0.004	MB/sec
Bench.run:·gc.churn.G1_Survivor_Space.norm	0.387	0.317	0.421	B/op
Bench.run:·gc.count	574	592	575	counts
Bench.run:·gc.time	317	309	303	ms

Results from benchmarking a write of data equivalent to a stream of 194617 nested binary Ion value(3 forks, 2 warmups, 2 iterations, preallocation 1).

Benchmark	No Change	Remove Patch Buffer	Optimize Patch List	Units
Bench.run	3988.343	3974.707	4031.55	ms/op
Bench.run:Heap usage	2815.305	2654.546	2686.608	MB
Bench.run:Serialized size	201.663	201.663	201.663	MB
Bench.run:·gc.alloc.rate	160.884	160.583	159.961	MB/sec
Bench.run:·gc.alloc.rate.norm	703010831.1	713050404	715554289.3	B/op
Bench.run:·gc.churn.G1_Eden_Space	32.785	40.088	37.906	MB/sec
Bench.run:·gc.churn.G1_Eden_Space.norm	143072369.8	178374428.4	169869312	B/op
Bench.run:·gc.churn.G1_Old_Gen	58.898	73.473	74.752	MB/sec
Bench.run:·gc.churn.G1_Old_Gen.norm	255793294.2	326050275.6	334445226.7	B/op
Bench.run:·gc.churn.G1_Survivor_Space	6.423	4.629	1.77	MB/sec
Bench.run:·gc.churn.G1_Survivor_Space.norm	28208808	20096732	7976842.667	B/op
Bench.run:·gc.count	16	20	15	counts
Bench.run:·gc.time	571	788	663	ms

…directly.

…oint instances.

codecov · 2023-11-07T18:10:31Z

Codecov Report

Attention: 19 lines in your changes are missing coverage. Please review.

Files	Coverage Δ
...c/com/amazon/ion/impl/_Private_RecyclingQueue.java	`89.28% <89.28%> (ø)`
...rc/com/amazon/ion/impl/bin/IonRawBinaryWriter.java	`92.06% <84.09%> (+1.82%)`	⬆️
...c/com/amazon/ion/impl/_Private_RecyclingStack.java	`75.60% <55.00%> (-19.85%)`	⬇️

... and 1 file with indirect coverage changes

📢 Thoughts on this report? Let us know!

tgregg

I think it would be interesting to do a performance comparison with the recycling queue both with and without auto-flush. I wonder if auto-flush amplifies the benefits of this change by limiting the size of the recycling queue.

linlin-s · 2023-11-07T20:42:23Z

I think it would be interesting to do a performance comparison with the recycling queue both with and without auto-flush. I wonder if auto-flush amplifies the benefits of this change by limiting the size of the recycling queue.

The flush operation did help reuse the objects in the recycling queue. From the benchmark profiling results, we observed 6% size decrease on patchPoints which is implemented by recycling queue.
No flush:

Flush:

…oint instances. (#521)

linlin-s marked this pull request as ready for review July 7, 2023 22:47

linlin-s force-pushed the remove-buffer branch from 6738b67 to 1406aad Compare July 21, 2023 23:16

linlin-s closed this Jul 28, 2023

linlin-s reopened this Jul 28, 2023

linlin-s mentioned this pull request Sep 14, 2023

Only initiate PatchPoint when needed. #565

Merged

Base automatically changed from remove-buffer to master November 6, 2023 22:48

linlin-s force-pushed the update-patches branch from 6a1cba1 to a4cb237 Compare November 7, 2023 00:50

tgregg added 2 commits November 6, 2023 22:32

Removes patch buffer and writes the patch length to the outputStream …

296f536

…directly.

Implements patch list as a single contiguous array of reusable PatchP…

4a05f12

…oint instances.

linlin-s force-pushed the update-patches branch from a4cb237 to 973b026 Compare November 7, 2023 06:36

linlin-s added 2 commits November 7, 2023 10:12

Only initiate PatchPoint when needed. (#565)

c902f0a

Updating stackIterator to public to fix the build failure.

cb769f5

linlin-s force-pushed the update-patches branch from e087fba to cb769f5 Compare November 7, 2023 18:42

linlin-s requested a review from tgregg November 7, 2023 18:47

tgregg approved these changes Nov 7, 2023

View reviewed changes

linlin-s merged commit 67e38c4 into master Nov 7, 2023
17 of 34 checks passed

tgregg deleted the update-patches branch November 7, 2023 23:31

tgregg mentioned this pull request Jan 18, 2024

Bumps version to 1.11.2-SNAPSHOT #701

Merged

tgregg mentioned this pull request Feb 9, 2024

Bumps version to 1.11.3-SNAPSHOT #720

Merged

tgregg mentioned this pull request Feb 22, 2024

Bumps version to 1.11.4-SNAPSHOT #753

Merged

tgregg mentioned this pull request Mar 1, 2024

Bumps version to 1.11.5-SNAPSHOT #760

Merged

This was referenced Apr 23, 2024

Bumps version to 1.11.6-SNAPSHOT #810

Closed

Bumps version to 1.11.8-SNAPSHOT. #815

Merged

linlin-s added a commit that referenced this pull request May 9, 2024

Implements patch list as a single contiguous array of reusable PatchP…

00f76a3

…oint instances. (#521)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implements patch list as a single contiguous array of reusable PatchPoint instances. #521

Implements patch list as a single contiguous array of reusable PatchPoint instances. #521

linlin-s commented Jul 5, 2023 •

edited

Loading

linlin-s commented Jul 21, 2023 •

edited

Loading

codecov bot commented Nov 7, 2023 •

edited

Loading

tgregg left a comment

linlin-s commented Nov 7, 2023

Implements patch list as a single contiguous array of reusable PatchPoint instances. #521

Implements patch list as a single contiguous array of reusable PatchPoint instances. #521

Conversation

linlin-s commented Jul 5, 2023 • edited Loading

linlin-s commented Jul 21, 2023 • edited Loading

codecov bot commented Nov 7, 2023 • edited Loading

Codecov Report

tgregg left a comment

Choose a reason for hiding this comment

linlin-s commented Nov 7, 2023

linlin-s commented Jul 5, 2023 •

edited

Loading

linlin-s commented Jul 21, 2023 •

edited

Loading

codecov bot commented Nov 7, 2023 •

edited

Loading