Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow configuration of the NDA_ORGINIZATION_ROOT_FOLDER #101

Open
ericearl opened this issue Jul 15, 2024 · 7 comments
Open

Allow configuration of the NDA_ORGINIZATION_ROOT_FOLDER #101

ericearl opened this issue Jul 15, 2024 · 7 comments

Comments

@ericearl
Copy link

I have an issue where multiple highly-parallel downloadcmd jobs collide temporary files at the ~/NDA/nda-tools/downloadcmd/packages/${PACKAGE_ID}/.download-progress folder. I'm thinking this issue could be resolved by allowing configurable or user-input global variables in the https://github.com/NDAR/nda-tools/blob/main/NDATools/__init__.py#L57-L76. Anyway, my current plan feels convoluted because of this need:

  1. Get an HPC instance with a local scratch space
  2. pip install nda-tools -t /scratch/$SLURM_JOB_ID
  3. Update the PATH environment variable to prioritize use of this new nda-tools instance
  4. Run a script to replace the os.path.expanduser('~') with the new /scratch/$SLURM_JOB_ID in https://github.com/NDAR/nda-tools/blob/main/NDATools/__init__.py#L57
  5. Copy back the /scratch/$SLURM_JOB_ID/bin/downloadcmd to /scratch/$SLURM_JOB_ID/downloadcmd (because of relative import problems)
  6. Then I can run downloadcmd safely in parallel without all the default files going to my home directory

I wonder if this is basically what you described having done on #84, @j-tseng?

P.S. Below is my blurb of code that does all that. It comes from https://github.com/nimh-dsst/abcd-fasttrack2bids/blob/main/swarm.sh#L70 and you can read my fix_downloadcmd.py script here: https://github.com/nimh-dsst/abcd-fasttrack2bids/blob/main/fix_downloadcmd.py.

DOWNLOADCMD_PATH=/lscratch/${SLURM_JOB_ID}/pip_install
mkdir ${DOWNLOADCMD_PATH}
poetry run --directory ${CODE_DIR} python -m pip install nda-tools -t ${DOWNLOADCMD_PATH}
poetry run --directory ${CODE_DIR} python ${CODE_DIR}/fix_downloadcmd.py ${DOWNLOADCMD_PATH}
cp ${DOWNLOADCMD_PATH}/bin/downloadcmd ${DOWNLOADCMD_PATH}/ 
export PATH=${DOWNLOADCMD_PATH}:${PATH}
@j-tseng
Copy link

j-tseng commented Jul 15, 2024 via email

@gregmagdits
Copy link
Contributor

gregmagdits commented Jul 15, 2024

Hi @ericearl / @j-tseng
I am looking into this now. I can see how it would be beneficial to change the location of the root directory in the case where space in the users home directory is limited, however I do not see how this will help issues related to file-name collisions, so please help me understand the trouble you guys are seeing.
The only temporary files that the program produces are:
1) .partial files (when each file is downloading, it is temporarily named .partial and then the suffix is removed when the download completes.)
2) failed-s3-links files. The file starts with failed_s3_links_file_, then has a timestamp, a unique suffix and a .csv extension. A typical file-name is failed_s3_links_file_20240715T113932puj1d1yf.csv

Are one of these two files involved in the file-name collisions that are being observed? If not, please provide some more information about the errors that are being observed.

p.s. for the case where the space on the ~ directory is limited, instead of modifying the codebase it should be possible to create a symbolic link from ~/NDA to wherever the file-system space is not limited.

Thanks

@j-tseng
Copy link

j-tseng commented Jul 15, 2024

Hi @gregmagdits - apologies, I'm a little fuzzy on this since the time delay strategy fixed our collision issue... I think when we investigated, it was something to do with the *.partial files but @ericearl might have better feedback since he's actively facing this issue.

Symbolic link workaround is a clever temp solution but I would still love to see the ability to specify output paths for full transparency. Plus, this may not work for folks whose filesystems don't support symbolic links across quota domains.

Thanks for your help!

@gregmagdits
Copy link
Contributor

@ericearl the next version of the tools is supposed to be released relatively soon. Can you provide more details about the file-name collisions you observed so that we can try to fit this into the next release?

@ericearl
Copy link
Author

@gregmagdits I have since deleted the logs that were full of something like a CSV value error where one of the temporary NDA CSV files was being written to concurrently and as soon as it was "double-written-to" once, every subsequent downloadcmd failed with the CSV issue. The solution was to rm -rf the .download-progress subfolder.

@gregmagdits
Copy link
Contributor

Right, that makes sense. @j-tseng found another open issue that might be a duplicate of this one (#94) . I think we have enough information to get started.

@ericearl
Copy link
Author

Yeah, just launch 1,000 or so concurrent downloadcmd jobs targeting different S3 links for each one and it will likely happen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants