-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow configuration of the NDA_ORGINIZATION_ROOT_FOLDER #101
Comments
Hi Eric,
We had a two-part issue: 1) home folder caps of 10GB so I changed the `(~)` to a different destination, and 2) we suspected that collisions were happening due to the parallel download requests happening too quickly, so we added a random jitter to the download request (using `sleep` at the job script outset to sample from a 10 min uniform time delay distribution) and that seems to address it.
However, I agree that an actual fix to the nda-tools repo to allow user-specified folders would be best. How come you're using this strategy (installing nda-tools on each job instance and running the script to replace the `~`) vs. forking the repo and modifying? Taking a quick look, adding a custom arg to change this path is probably something like: 1) adding the arg to `downloadcmd.py` such that the arg is available when the init is called at line 170, which 2) then routes to `__init__.py` under `init_and_create_configuration`, relevant since that's the class that specifies the folder locations.
Hope that helps!
JT
P.S. Apologies for the horrendous email auto-formatting.
|
Hi @ericearl / @j-tseng Are one of these two files involved in the file-name collisions that are being observed? If not, please provide some more information about the errors that are being observed. p.s. for the case where the space on the ~ directory is limited, instead of modifying the codebase it should be possible to create a symbolic link from ~/NDA to wherever the file-system space is not limited. Thanks |
Hi @gregmagdits - apologies, I'm a little fuzzy on this since the time delay strategy fixed our collision issue... I think when we investigated, it was something to do with the *.partial files but @ericearl might have better feedback since he's actively facing this issue. Symbolic link workaround is a clever temp solution but I would still love to see the ability to specify output paths for full transparency. Plus, this may not work for folks whose filesystems don't support symbolic links across quota domains. Thanks for your help! |
@ericearl the next version of the tools is supposed to be released relatively soon. Can you provide more details about the file-name collisions you observed so that we can try to fit this into the next release? |
@gregmagdits I have since deleted the logs that were full of something like a CSV value error where one of the temporary NDA CSV files was being written to concurrently and as soon as it was "double-written-to" once, every subsequent downloadcmd failed with the CSV issue. The solution was to rm -rf the .download-progress subfolder. |
Yeah, just launch 1,000 or so concurrent downloadcmd jobs targeting different S3 links for each one and it will likely happen. |
I have an issue where multiple highly-parallel
downloadcmd
jobs collide temporary files at the~/NDA/nda-tools/downloadcmd/packages/${PACKAGE_ID}/.download-progress
folder. I'm thinking this issue could be resolved by allowing configurable or user-input global variables in the https://github.com/NDAR/nda-tools/blob/main/NDATools/__init__.py#L57-L76. Anyway, my current plan feels convoluted because of this need:pip install nda-tools -t /scratch/$SLURM_JOB_ID
PATH
environment variable to prioritize use of this newnda-tools
instanceos.path.expanduser('~')
with the new/scratch/$SLURM_JOB_ID
in https://github.com/NDAR/nda-tools/blob/main/NDATools/__init__.py#L57/scratch/$SLURM_JOB_ID/bin/downloadcmd
to/scratch/$SLURM_JOB_ID/downloadcmd
(because of relative import problems)downloadcmd
safely in parallel without all the default files going to my home directoryI wonder if this is basically what you described having done on #84, @j-tseng?
P.S. Below is my blurb of code that does all that. It comes from https://github.com/nimh-dsst/abcd-fasttrack2bids/blob/main/swarm.sh#L70 and you can read my
fix_downloadcmd.py
script here: https://github.com/nimh-dsst/abcd-fasttrack2bids/blob/main/fix_downloadcmd.py.The text was updated successfully, but these errors were encountered: