Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add setup script and updates for wilms data #818

Merged
merged 9 commits into from
Dec 2, 2024
1 change: 1 addition & 0 deletions components/dictionary.txt
Original file line number Diff line number Diff line change
Expand Up @@ -485,6 +485,7 @@ VST
Wattenberg
Wickham
Wickham's
Wilms
WIPO
WNT
xenograft
Expand Down
2 changes: 1 addition & 1 deletion scRNA-seq-advanced/setup/ewing-sarcoma/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ The shell script in this directory downloads processed `SingleCellExperiment` ob

You may want to run it with your OpenScPCA conda environment activated.

By default, it will use an AWS profile called `openscpca` and download data from the `2024-08-22` OpenScPCA release.
By default, it will use an AWS profile called `openscpca` and download data from the `2024-11-25` OpenScPCA release.

You can alter the AWS profile or release with the following:

Expand Down
13 changes: 13 additions & 0 deletions scRNA-seq-advanced/setup/wilms-tumor/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
The shell script in this directory downloads processed `SingleCellExperiment` object for the `SCPCS000203` sample from `SCPCP000006` (Wilms tumor) using the data download mechanism from OpenScPCA.

You may want to run it with your OpenScPCA conda environment activated.

By default, it will use an AWS profile called `openscpca` and download data from the `2024-11-25` OpenScPCA release.
sjspielman marked this conversation as resolved.
Show resolved Hide resolved

You can alter the AWS profile or release with the following:

```sh
PROFILE={profile} RELEASE={release} ./download-openscpca-data.sh
sjspielman marked this conversation as resolved.
Show resolved Hide resolved
```

Replacing `{profile}` and `{release}` with a profile with OpenScPCA access and valid release, respectively.
50 changes: 50 additions & 0 deletions scRNA-seq-advanced/setup/wilms-tumor/download-openscpca-data.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
#!/bin/bash

set -euo pipefail

# Profile to use with the download script
PROFILE=${PROFILE:-openscpca}
sjspielman marked this conversation as resolved.
Show resolved Hide resolved
# Release to download
RELEASE=${RELEASE:-2024-11-25}

# Set the working directory to the directory of this file
cd "$(dirname "${BASH_SOURCE[0]}")"

# Define ScPCA project id
project_id="SCPCP000006"

# Set up directories
wilms_data_dir="../../data/wilms-tumor"
processed_dir="${wilms_data_dir}/processed"

# Create directories if they don't exist yet
mkdir -p "${processed_dir}"

# Get download data script from OpenScPCA for convenience
curl -O \
-L https://raw.githubusercontent.com/AlexsLemonade/OpenScPCA-analysis/a3d8a2c9144e8edb3894a7beeb89cdc6c3e6d681/download-data.py
# Make executable
chmod +x download-data.py

# Download Wilms tumor samples
./download-data.py \
--samples 'SCPCS000203' \
--format SCE \
--release ${RELEASE} \
--data-dir ${wilms_data_dir} \
--profile ${PROFILE}

# Remove existing files from processed directory
if [ -z "$( ls -A ${processed_dir} )" ]; then
echo "No processed files yet!"
else
rm -r ${processed_dir}/*
fi

# Move processed files from release folder into processed_dir
mv ${wilms_data_dir}/${RELEASE}/${project_id}/* ${processed_dir}

# Clean up download data script
rm download-data.py
# Clean up the remnants of download structure
rm -r ${wilms_data_dir}/${RELEASE}
2 changes: 2 additions & 0 deletions scripts/link-data.sh
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ mkdir -p scRNA-seq-advanced/data/rms/integrated
mkdir -p scRNA-seq-advanced/data/rms/annotations
mkdir -p scRNA-seq-advanced/data/pancreas
mkdir -p scRNA-seq-advanced/data/hodgkins
mkdir -p scRNA-seq-advanced/data/wilms-tumor/processed

# Machine learning module directory
mkdir -p machine-learning/data
Expand Down Expand Up @@ -94,6 +95,7 @@ link_locs=(
scRNA-seq-advanced/data/rms/annotations/rms_sample_metadata.tsv
scRNA-seq-advanced/data/pancreas/processed
scRNA-seq-advanced/gene-sets
scRNA-seq-advanced/data/wilms-tumor/processed
machine-learning/data/open-pbta
pathway-analysis/data/leukemia
pathway-analysis/data/medulloblastoma
Expand Down
3 changes: 2 additions & 1 deletion scripts/syncup-s3.sh
Original file line number Diff line number Diff line change
Expand Up @@ -37,14 +37,14 @@ sync_dirs=(
scRNA-seq/data/tabula-muris/alevin-quant/10X_P7_12
scRNA-seq/data/reference
scRNA-seq/index/Mus_musculus
scRNA-seq-advanced/data/ewing-sarcoma/processed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you meant to remove this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy/paste gone wrong indeed!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scRNA-seq-advanced/data/glioblastoma-10x/raw_feature_bc_matrix
scRNA-seq-advanced/data/hodgkins/markers
scRNA-seq-advanced/data/PBMC-TotalSeqB/raw_feature_bc_matrix
scRNA-seq-advanced/data/rms/processed
scRNA-seq-advanced/data/rms/integrated
scRNA-seq-advanced/data/pancreas/processed
scRNA-seq-advanced/gene-sets
scRNA-seq-advanced/data/wilms-tumor/processed
machine-learning/data/open-pbta/processed
pathway-analysis/data/leukemia
pathway-analysis/data/medulloblastoma
Expand All @@ -63,6 +63,7 @@ sync_files=(
scRNA-seq-advanced/data/ewing-sarcoma/annotations/ewing_sarcoma_sample_metadata.tsv
scRNA-seq-advanced/data/rms/annotations/rms_sample_metadata.tsv
scRNA-seq-advanced/data/reference/hs_mitochondrial_genes.tsv
scRNA-seq-advanced/data/wilms-tumor/processed/SCPCS000203/SCPCL000240_processed.rds
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not need to include the individual file here. The only time this is needed is if we don't want to upload the entire folder. Nothing nested in a directory above should be needed here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in 6422de7

)

output_files=(
Expand Down