-
-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: Fixes handing large number data files in Metadata Editor #2586
base: develop
Are you sure you want to change the base?
bug: Fixes handing large number data files in Metadata Editor #2586
Conversation
8f06c8f
to
72cceb6
Compare
@robyngit @rushirajnenuji let me know if you have any questions or concerns. This is a critical fix for us and I am happy to work with you to get it in. |
72cceb6
to
dacf926
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Val, thanks for the PR!!
Great idea to upload/fetch in batches. I tested with large data packages, nested packages, packages with atLocation
file hierarchies and found no issues, everything worked.
I expected to see some performance differences, but found no detectable difference between this PR (with batch sizes set to 20 or more) and the develop branch. I tested uploading 1000+ files at once, each file being ~8MB. The time taken to upload was the same for both branches, and I didn't run into any issues with files stalling at 100% or uploads failing on either branch. Watching the upload & network tab on the develop branch, it appears as though the browser is doing some work behind the scenes to upload files in batches already... So I'm wondering if you were able to consistently reproduce these issues on your end, and whether the changes in this PR truly fixed them?
Is it worth digging deeper into the root cause of the issue the user with 1000 files was experiencing, or are you confident that it was because of concurrent uploads? (vs. perhaps network instability)
If we move forward with this PR, I have just a few notes:
- I've merged some updates that include linting and formatting fixes to the DataPackage collection and the EML211EditorView, which have created conflicts with your branch. Please merge the latest develop branch into your PR branch to resolve these conflicts.
- Even though the codebase isn’t fully compliant yet with the new style guide, the goal is slowly to bring it in line by ensuring new contributions adhere to it. We use ESLint and prettier, see the Contribution Guide for instructions on how to run them. This shouldn't take long and I'm happy to help if you need it.
- We're working toward increasing our test coverage, and so I'm trying to enforce that all new methods have unit tests. The tests might be easier to write if the
uploadFilesInBatch
andfetchMemberModels
methods were broken down into smaller, more modular functions. If you feel that this will take too long, and given the urgency of this fix, I can merge your PR and create an issue to track the test writing. fetchMemberModels
method is really model/collection logic and should be moved out of the view.
Given that this fix is potentially critical for ESS-DIVE, and notes # 3 and # 4 may take some time to address, we could merge this PR as is and then create an issue to track the test writing and refactoring. What do you think?
Happy to discuss further! Thanks again for the PR!
PS: I'm currently refactoring the DataPackage collection & related models to address the submission errors, so hopefully identifying and fixing issues like this one will be more straightforward in the future.
* | ||
* @param {FileList} fileList - The list of files to be uploaded. | ||
* @param {number} [batchSize=10] - The number of files to upload concurrently. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*/ | |
* @since 0.0.0 | |
*/ |
Hard to catch on video, but this shows files uploading on develop branch. It appeared that files began uploading from the bottom of the list, and once a batch of them were complete, the next batch began uploading. Is this different from what you've been seeing @vchendrix ? upload.on.develop.branch.mp4 |
Yes, that is what I am seeing. |
Thanks @robyngit. I will make the changes and resubmit. |
@robyngit I am still digesting the rest of your comments and may have questions about that in a little bit. First, I wanted to respond to this:
For me that is not the case, all fetches are submitted concurrently and this can cause errors 500 errors on metacat due to there being too many concurrent requests. I have serveral different cloud instances of metacat running and on google gloud I can see the container RAM going to the max during these large fetches.
I think that it is a confluence of issues but it is hard to pinpont. We have many users who do not have the greatest internest connection and they are consistently having problems. |
dacf926
to
c270e18
Compare
Enables Batch fetch of member models and updates save button control toggling - Adds `fetchMemberModels` method to `DataPackage` to fetch member models in batches. - Updates `fetch` method in `DataPackage` to use `fetchMemberModels`. - Adds listener for `numLoadingFileMetadata` change in `EML211EditorView`. - Updates `toggleEnableControls` in `EML211EditorView` to handle `numLoadingFileMetadata`. - Adds `fetchBatchSize` configuration to `AppModel` to control batch size for fetching member models. Closes NCEAS#2547
- Updated `fetchBatchSize` in `DataPackage.js` to replace `batchModeValue`. - Added `uploadBatchSize` configuration in `AppModel.js`. - Implemented `uploadFilesInBatch` method in `DataItemView.js` to handle batch file uploads using promises. - Ensured that the `_.each` loop completes before proceeding to batch processing. Closes NCEAS#2224
c270e18
to
dacf926
Compare
@robyngit I tried rebasing with the develop branch but there seem to be bugs in this branch and I can't seem to satisfactorily test the code. On save in the Metadata Editor, I get the following error. This is blocking me. Uncaught TypeError: errors?.forEach is not a function |
- new test spec for DataItemView. Specifically tests uploadFilesInBatch and addFiles in relation to NCEAS#2224 - Adds to test spec for DataPackage. Specifically tests fetchMemberModels functionality in relation to NCEAS#2547 - Ensures comments provide context and purpose for each action in the tests
dacf926
to
4fd253e
Compare
@robyngit I did my best to add tests and fix the lint errors for the js files you requested but it seems there are still issues with the checks passing. Something about a Github token not having write permission. I am going to sign off until the new year. I hope this suffices. My added tests all pass locally. NOTE that the develop branch has a regression issue with saving on the Metadata Editor page which has no relation to this PR. |
@robyngit Happy New Year! I am circling back on this to see if there is anything you need from me to get this merged. |
c1a9a2e
to
36f6e2c
Compare
This follows the suggestion as seen in the error message on the ghaction: Run npm test > [email protected] test > node test/server.js Failed to launch the browser process! [2359:2359:0107/175821.608019:FATAL:zygote_host_impl_linux.cc(126)] No usable sandbox! Update your kernel or see https://chromium.googlesource.com/chromium/src/+/main/docs/linux/suid_sandbox_development.md for more information on developing with the SUID sandbox. If you want to live dangerously and need an immediate workaround, you can try using --no-sandbox.
36f6e2c
to
adf9e64
Compare
@robyngit I have updated this PR so that all the tests pass. Please note that I had to make a change to the test server configuration for puppeteer to fix an issue in which the tests were hanging on gh actions. Please let me know if there are any further issues you would like me to address in order to get this merged. Thanks! |
Summary
This pull request addresses two issue tickets related to handling large numbers of files when editing a dataset. One issue involves users uploading tens to hundreds of files at once, which inundates Metacat and leads to unrecoverable upload failures, rendering the dataset unsavable. The other issue pertains to loading the metadata editor when a dataset contains a large number of files. The former fix batches the upload and provides a configuration for
batchSizeUpload
. The latter batches the loading of the system metadata and prevents the user from saving until all data file metadata is loaded, with the batch size configurable viabatchSizeFetch
. Additionally, this pull request implements batch file upload functionality using promises and fixes issues related to loading a large number of data files in the Metadata Editor.This fix is a high priority for ESS-DIVE and Wildland Fire Science Intitiative as it significantly improves user experience and it will prevent data corruption while handling a large number of files in datasets
Features and Fixes
Batch File Upload with Promises
batchSizeUpload
Configuration inAppModel.js
: Introduced a new configuration to control the batch size for file uploads.uploadFilesInBatch
Method inDataItemView.js
: Added a method to handle batch file uploads using promises, ensuring that the_.each
loop completes before proceeding to batch processing.Fixes for Loading Large Number of Data Files
fetchMemberModels
Method toDataPackage
: Introduced a method to fetch member models in batches.fetch
Method inDataPackage
: Modified thefetch
method to use the newfetchMemberModels
method.numLoadingFileMetadata
Change inEML211EditorView
: Added a listener to handle changes in the number of loading file metadata.toggleEnableControls
inEML211EditorView
: Enhanced the method to handlenumLoadingFileMetadata
.fetchBatchSize
Configuration toAppModel
: Introduced a configuration to control the batch size for fetching member models.Testing
This fix has been tested on MacOS using the latest versions of Chrome, Firefox, and Safari.
Issues Closed