Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not all packages will have a manifest file, the README.md should address this and provide suggestions for how to work with S3 URLs more generally #1

Open
obenshaindw opened this issue Apr 2, 2021 · 2 comments
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@obenshaindw
Copy link
Contributor

An NDA Data Package will only include the file data_structure_manifest.txt file if the packaged data were submitted using manifest files (a many to one relationship between the metadata and a collection of related files).

The README.md should describe that S3 URLs are present not only in data_structure_manifest.txt, but also in individual data structures included in the data package. A developer can request pre-signed URLs or access tokens for any of the S3 URLs provided in the data package, so long as the package was created with the option to include associated files.

@obenshaindw obenshaindw added the documentation Improvements or additions to documentation label Apr 2, 2021
@petralenzini
Copy link

Hope you don't mind me adding my history and two cents, since I'm not a developer, and am only vaguely familiar with the context for this new open issue: HCP data are submitted using manifest files (the .json manifests...not to be confused with datastructure_manifest.txt files from download). HCP users are given tips on how to locate a datastructure_manifest.txt, subset based on HCP-style naming conventions in the 'manifest_name' (.json) column of interest, and then capture the corresponding list of S3 links to send to the downloadcmd tool, so that it is not necessary to download an ENTIRE package created with the option to include associated files. Sounds like this isn't the only way to get a list of S3 links for a package of HCP data.
Depending on where and when this README.md shows up in a user's interface with NDA tools and data (and especially if you're referring to the main nda-tools github readme.md), a key point might be to clarify how/where to obtain just the meta-level 'imagingcollection01.txt' file from a package structure that had been packaged WITH the option to include associated files (but without actually downloading these files). I.E. beyond knowing that S3 links exist in datastructure_manifest.txt or some other structure, HOW would a user obtain this list of S3 links, wherever they may reside, without downloading everything in the entire package?

@aburr-nimh
Copy link
Contributor

@obenshaindw wrote:

The README.md should describe that S3 URLs are present not only in data_structure_manifest.txt, but also in individual data structures included in the data package. A developer can request pre-signed URLs or access tokens for any of the S3 URLs provided in the data package, so long as the package was created with the option to include associated files.

It is my understanding that this issue is about changing the 'manifest file approach' to a more broadened 'getting package service files from s3 urls'. Because not all packages will have a manifest file (as mentioned above) but all packages are able to resolve their specific files from s3 paths. I think this can be fulfilled by tweaking some of the code examples to be more widely applicable (e.g. removing specific row mentions from the manifest file loading segments) while also adding to the comments or explanation that this is just one implementation and that one can use whatever format they like, as long as they are able to provide s3 urls.

@petralenzini wrote:

a key point might be to clarify how/where to obtain just the meta-level 'imagingcollection01.txt' file from a package structure
that had been packaged WITH the option to include associated files (but without actually downloading these files).

There is documentation on this but it's not enshrined as a code example, and is mentioned as an advanced usage of the API inside of the note on manifest files. However, this comment makes it clear that we should bring this example more front and center rather than burying it in a cryptically named file that only some users will think is applicable to them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants