-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uploading exported data to s3 without storing at the local disk #20
Comments
@SergeVil I have a few questions to make sure we build the right solution. The biggest question I have is how big is the exported data in your case? As it is currently implemented, the export tool needs to buffer all of the data locally as it builds the schema and then reformats the csvs according to the schema. Are you looking for a solution which would have all the results streamed to S3 with minimal local buffering or would your use case allow for the full export to be buffered locally as long as it is buffered in memory instead of on disk? |
@Cole-Greer Our current total data size is 14Gb. We may expect more, but this is the order of magnitude. The export folder on S3 currently contains 423 objects. We will be fine with both approaches that you propose. I think that the first one is more generic as it can support the bigger databases, but the second one will also work for our use case. |
Also you can store the files in the compressed format on the local disk. It can save up to 60% for plain text. |
@SergeVil It's good to hear that both approaches will work for you. We are going to start by adding an option to buffer the results in memory as it is likely to be the quickest solution for your issue. We are having early discussions to potentially change the way we resolve the graph schema to remove the need for this buffering but those changes would be extensive and won't be ready for some time. |
Hi @Cole-Greer. Thank you for taking the quickest path! When you are planning a release we can try? |
@SergeVil The next Neptune Export release is intended to come out later today although unfortunately this fix has not yet been completed. It is being prioritized for the following release which is scheduled to come out at the end of April. |
@Cole-Greer Thank you for the update. Please, keep me posted if anything changes, our production release depends on this. |
@SergeVil I spent some time working on the in-memory buffering option mentioned above and unfortunately it does require additional capabilities to implement it in a performant manner. I believe there is a viable short-term workaround which you could make use of immediately and that covers your use case. Docker containers have a
I hope that this provides a short term solution to your issue. Please let me know if you have any challenges related to this workaround. |
Copying #309 from @SergeVil from the old amazon-neptune-tools repository.
We are running the Neptune AWS Batch Fargate that has a local disk limitation 20Gb, while the memory is pretty big - 128Gb. Such a low disk volume limits us to export the large database (getting the OS error). We look for the option to upload the exported csv files right away from JVM memory to the S3 bucket. Thank you!
The text was updated successfully, but these errors were encountered: