Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable configurable write buffer block size. #650

Merged
merged 5 commits into from
Dec 6, 2023
Merged

Conversation

linlin-s
Copy link
Contributor

Issue #, if available:
N/A
Description of changes:

This Pull Request introduces a feature that allows users to configure the default block size for the write buffer. Adjusting the block size can lead to performance enhancements by reducing block allocations. To determine the optimal block size for their specific use case, users can utilize the ion-java-benchmark-cli tool. Once the ideal block size is identified, it can be set using IonWriter writer = IonBinaryWriterBuilder.standard().withDefaultBlockSize(size).build(out).
Additionally, this feature can be used on top of auto-flush mechanisms, further fine-tuning performance.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@linlin-s linlin-s marked this pull request as ready for review November 28, 2023 00:05
@@ -223,6 +223,13 @@ public _Private_IonBinaryWriterBuilder withLocalSymbolTableAppendDisabled()
return b;
}

@Override
public _Private_IonBinaryWriterBuilder withDefaultBlockSize(int size) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the word "default" from the name of these methods. The user is setting the block size; the default block size is 32K.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the word "default" from the name of these methods. The user is setting the block size; the default block size is 32K.

Sounds good, will remove in the next commit.

@@ -194,6 +194,12 @@ public IvmMinimizing getIvmMinimizing()
*/
public abstract IonBinaryWriterBuilder withLocalSymbolTableAppendDisabled();

/**
* Specify the default block size for write buffer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Specify the default block size for write buffer.
* Specify the block size for write buffer.

@@ -275,6 +282,9 @@ public void setStreamCopyOptimized(final boolean optimized)

//=========================================================================

public void setBlockSize(int size) {
myBinaryWriterBuilder.withUserBlockSize(size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also set the symbol table block size?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also set the symbol table block size?

Yes, thanks for catching this. Previously, we only flush the data at IonRawBinaryWriter level, and for now we implemented auto-flush at the IonManagedBinaryWriter level which will flush both symbol table buffer and user writer buffer. In this case, I'm wondering if we should set a separate API to set symbol table block size? The flush operation is decided by the user block boundary check and we might need a different block size for symbol table since when the writer is flushed the symbol table block might be only half occupied, which will lead to the waste of extra bytes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think exposing both is fine, but we need to be very clear in the documentation the differences between them and how you would decide whether to change them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think exposing both is fine, but we need to be very clear in the documentation the differences between them and how you would decide whether to change them.

Based on the offline discussion with @tgregg, we should set the symbol table block size matches with the user block size while the size is set smaller than 32K, and keep the symbol table block size as 32k when users set the block size larger than 32K.

Copy link

codecov bot commented Dec 6, 2023

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (5daea99) 67.01% compared to head (85c29ec) 67.05%.
Report is 2 commits behind head on master.

Files Patch % Lines
...azon/ion/impl/_Private_IonBinaryWriterBuilder.java 93.75% 0 Missing and 1 partial ⚠️
...va/com/amazon/ion/impl/bin/IonRawBinaryWriter.java 88.88% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master     #650      +/-   ##
============================================
+ Coverage     67.01%   67.05%   +0.04%     
- Complexity     5459     5470      +11     
============================================
  Files           159      159              
  Lines         22938    22970      +32     
  Branches       4113     4116       +3     
============================================
+ Hits          15371    15403      +32     
+ Misses         6289     6287       -2     
- Partials       1278     1280       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@linlin-s linlin-s merged commit c6acba4 into master Dec 6, 2023
19 of 34 checks passed
@linlin-s linlin-s deleted the set-block-size branch January 16, 2024 19:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants