Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bug fixes and improvements to DDB source #3559

Merged
merged 2 commits into from
Oct 31, 2023

Conversation

daixba
Copy link
Contributor

@daixba daixba commented Oct 30, 2023

Description

Fix below bugs:

  • An issue with out of range of int for item counts
  • The lease of shard will expire when waiting for export to be completed for a long time.
  • Wait for export doesn't work if no change events in the shard.

Make below improvements:

  • Optimize the factory and builder class for data file loader and shard consumer
  • Other miscellaneous improvement.

Check List

  • New functionality includes testing.
  • New functionality has a documentation issue. Please link to it in this PR.
    • New functionality has javadoc added
  • Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Aiden Dai <[email protected]>
@@ -37,30 +46,37 @@ public class DataFileLoader implements Runnable {
*/
private static final int DEFAULT_CHECKPOINT_INTERVAL_MILLS = 2 * 60_000;

static final Duration BUFFER_TIMEOUT = Duration.ofSeconds(60);
static final int DEFAULT_BUFFER_BATCH_SIZE = 1_000;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how were this number chosen ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both number won't make too much differences.
There will be retries after timeout anyway. 1000 is the batch size we used for reading file and reading stream data, so we just use the same.

@@ -128,7 +150,9 @@ public void run() {
int lineCount = 0;
int lastLineProcessed = 0;

try (GZIPInputStream gzipInputStream = new GZIPInputStream(s3ObjectReader.readFile(bucketName, key))) {
try {
InputStream inputStream = objectReader.readFile(bucketName, key);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:
try (
  InputStream inputStream = objectReader.readFile(bucketName, key);
  GZIPInputStream gzipInputStream = new GZIPInputStream(inputStream);
  BufferedReader reader = new BufferedReader(new InputStreamReader(gzipInputStream));
) {
..
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know we could do that. I thought it can only be one.

LOG.info("Complete loading s3://{}/{}", bucketName, key);
} catch (Exception e) {
} catch (IOException e) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this exception scoped down ? How are we handling other exception ?

Copy link
Contributor Author

@daixba daixba Nov 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. There may be other exceptions. Since this is merged, I will update this in the next PR.

@graytaylor0 graytaylor0 merged commit c560e1f into opensearch-project:main Oct 31, 2023
33 checks passed
@daixba daixba deleted the ddb-source-fix branch November 2, 2023 05:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants