Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mirror node importer to support download stream files form subpath of a storage bucket. #10026

Closed
JeffreyDallas opened this issue Dec 30, 2024 · 9 comments · Fixed by #10157
Closed
Assignees
Labels
downloader Area: S3 downloader enhancement Type: New feature
Milestone

Comments

@JeffreyDallas
Copy link

JeffreyDallas commented Dec 30, 2024

Problem

Currently mirror node importer can only download stream files from storage with a specific bucket, then import stream files into database

<bucket_name>/eventStream
<bucket_name>/recordStream

We request importer to be able to download stream files from subpath of a bucket

<bucket_name>/test_run1/eventStream
<bucket_name>/test_run1/recordStream

or

<bucket_name>/test_run2/eventStream
<bucket_name>/test_run2/recordStream

This is because when running different test with solo deployment,
without subpath support, all these tests would write to the same buckets.
We could workaround the issue with creating different buckets for different test runs, but GCS has limit of how many and how fast bucket can be created.
So it is required to have importer to support subpath in order to run relatively large number of tests parallely.

Once the feature is implemented, please also update helm chart to expose any changes
so solo chart can utilize the newly implemented feature.

Solution

Mirror node importer to support subpath of a giving storage bucket.

  • add new configuration property String pathPrefix to class CommonDownloaderProperties, default to empty string
  • in S3StreamFileProvider.getPrefix, conditionally prepend pathPrefix if it's not empty
  • update configuration.md

Alternatives

No response

@JeffreyDallas JeffreyDallas added the enhancement Type: New feature label Dec 30, 2024
@github-project-automation github-project-automation bot moved this to 📋 Backlog in Mirror Node Jan 5, 2025
@steven-sheehy steven-sheehy moved this from 📋 Backlog to 🏃‍♀ Sprint backlog in Mirror Node Jan 5, 2025
@steven-sheehy steven-sheehy added the downloader Area: S3 downloader label Jan 6, 2025
@HarshSawarkar
Copy link
Contributor

Hi, I would like to work on this issue if no one else is currently working on it.

@xin-hedera
Copy link
Collaborator

@HarshSawarkar thanks. I have assigned the ticket to you and added a brief solution in the ticket. let me know if you have any questions.

@xin-hedera xin-hedera moved this from 🏃‍♀ Sprint backlog to 👷 In progress in Mirror Node Jan 17, 2025
@HarshSawarkar
Copy link
Contributor

@xin-hedera Thanks for your response. I have raised a PR with the changes mentioned in the ticket. Could you please review it?

@HarshSawarkar
Copy link
Contributor

HarshSawarkar commented Jan 17, 2025

Also, I ran the S3StreamFileProviderTest, and two tests are failing with the following errors:

listError Test Failure:
The expectation expectError(Class) failed.

Expected: onError(RuntimeException)
Actual: onNext(2022-07-13T08_46_08.041986003Z.rcd_sig)

getError Test Failure:
The expectation expectError(Class) failed.

Expected: onError(RuntimeException)
Actual: onNext([B@1604e559)

@xin-hedera
Copy link
Collaborator

Also, I ran the S3StreamFileProviderTest, and two tests are failing with the following errors:

listError Test Failure: The expectation expectError(Class) failed.

Expected: onError(RuntimeException) Actual: onNext(2022-07-13T08_46_08.041986003Z.rcd_sig)

getError Test Failure: The expectation expectError(Class) failed.

Expected: onError(RuntimeException) Actual: onNext([B@1604e559)

perhaps they are flaky, can ignore.

@xin-hedera xin-hedera added this to the 0.122.0 milestone Jan 17, 2025
@HarshSawarkar
Copy link
Contributor

Also, I ran the S3StreamFileProviderTest, and two tests are failing with the following errors:
listError Test Failure: The expectation expectError(Class) failed.
Expected: onError(RuntimeException) Actual: onNext(2022-07-13T08_46_08.041986003Z.rcd_sig)
getError Test Failure: The expectation expectError(Class) failed.
Expected: onError(RuntimeException) Actual: onNext([B@1604e559)

perhaps they are flaky, can ignore.

Yeah, sure, will do.

@HarshSawarkar
Copy link
Contributor

@xin-hedera I'm looking to work on some meaningful issues. Could you please guide me on where to start?

@steven-sheehy steven-sheehy moved this from 👷 In progress to 👀 In review in Mirror Node Jan 20, 2025
@steven-sheehy
Copy link
Member

Hi @HarshSawarkar. Thanks for all your efforts thus far contributing to Hedera! Our two main focus areas right now that we need help on are #8834 and #8828. For block stream, help with one of the transformer tickets might be the easiest. If you're interested in learning more about EVMs, the modularized EVM epic could use help fixing the numerous tests broken when modularizedServices=true. Feel free to message me or Xin in discord if you'd like to discuss further.

@HarshSawarkar
Copy link
Contributor

Hi @HarshSawarkar. Thanks for all your efforts thus far contributing to Hedera! Our two main focus areas right now that we need help on are #8834 and #8828. For block stream, help with one of the transformer tickets might be the easiest. If you're interested in learning more about EVMs, the modularized EVM epic could use help fixing the numerous tests broken when modularizedServices=true. Feel free to message me or Xin in discord if you'd like to discuss further.

@steven-sheehy Thank you for your response! I will review the tickets mentioned and conduct some research around them. I will reach out to you or @xin-hedera on Discord if I come across any issues within either of the tickets that I would like to work on or if I have any doubts.

@github-project-automation github-project-automation bot moved this from 👀 In review to ✅ Done in Mirror Node Jan 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
downloader Area: S3 downloader enhancement Type: New feature
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants