-
Notifications
You must be signed in to change notification settings - Fork 1
Home
Data for each project is included in a Terra workspace, the BIOM-Mass google cloud buckets, and the local database. A project can include metadata and data files, just metadata or just data files. For all cases there will be a Terra workspace.
Follow these steps to deposit a new project's data to the BIOM-Mass site.
-
Create a new workspace in Terra (under the billing space
terra-biom-mass
). The workspace name should be the project name.-
Add as reader:
[email protected]
(this is the service account for Firecloud API queries).- This is a service account in the biom-mass project. It does not have any access permissions to the biom-mass project. It has been registered with Terra so we can use it to query the API for information about workspaces it has read access to.
-
If the workspace is to be made public, contact the Terra Helpdesk to change the permissions so it is publicly accessible.
-
Upload the data tables. First upload a table of all of the participant ids. Select the button to associate sample and participant ids. Next upload a table of raw files by sample id. Columns should be sample_id (sample id plus string to indicate file type if more then one raw file per sample is included in the project), file_id (full URL to file in cloud bucket), file size (KB), participant (participant id), sample (sample id). The data tables do not contain metadata for the samples they only contain information about the project data files and sample and participant ids.
a. Please note: The site is set up to allow for projects with metadata that do not currently have data files (eg. fastq files, taxonomic profiles, etc). When adding a project that does not currently include data files, still create the workspace in Terra, uploading a file of all the participant ids. When uploading the sample ids the columns should be sample_id, "NA", "NA", participant_id, sample_id (where file_id and file_size are both NAs). When the project has data files, replace the existing samples id data in the Terra workspace with data that includes the new file URLs and sizes. Then reload the local database so these files show up in the portal UI.
-
-
Upload the fastq files (qced), taxonomic profiles, gene families, and pathway abundance files (and any other sequencing data and data products) to the biom-mass buckets.
-
$ gsutil cp -r . $BUCKET
(replacing $BUCKET with the workspace google cloud bucket)- Include the individual abundance files and also a set of merged files for each project.
- Large files (like fastq files) should be placed in a bucket with
requester-pays
in the name as these require the user to pay for downloads. - Files that are not for public access should be placed in a bucket with
restricted
in the name. - The folder naming convention for the buckets are
gs:// sequencing type / data type (eg MTX) / data format (eg Raw_reads) / project name (to match the Terra workspace name) /
. - If downloading, converting to fastq, and renaming file from SRA, the script
download_sra_rename_upload.py
can be used. This script was used for the HPFS workspace.
-
-
Load the two metadata files into the local database. One file is for the participant and the other is for the sample metadata. The file metadata will be queried from the Terra workspace using the Firecloud API. These files should be tab-delimited and include the sample and participant ids for each row of data. The sample and participant ids should match those found in the corresponding Terra workspace.
- The script
load_local_database.py
will add the new data into the databases from the local files by including the options--file-participant <participant_file.tsv> --file-sample <sample_file.tsv>
.
If the project has already been deposited to the BIOM-Mass site, follow these steps to change the access permissions to the data.
-
Make the Terra workspace associated with the project open access by contacting the Terra Helpdesk.
-
Move the data files for the project in the BIOM-Mass google cloud buckets from
restricted
buckets toopen
buckets. -
Reload the local database, providing the metadata files for all projects, to update the links on the site to the new URLs for the project.
-
Make the Terra workspace associated with the project closed access by contacting the Terra Helpdesk.
-
Move the data files for the project in the BIOM-Mass google cloud buckets from
open
buckets torestricted
buckets. -
Reload the local database, providing the metadata files for all projects, to update the links on the site to the new URLs for the project.