Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add setup script and updates for wilms data #818

Merged
merged 9 commits into from
Dec 2, 2024
1 change: 1 addition & 0 deletions components/dictionary.txt
Original file line number Diff line number Diff line change
Expand Up @@ -485,6 +485,7 @@ VST
Wattenberg
Wickham
Wickham's
Wilms
WIPO
WNT
xenograft
Expand Down
4 changes: 2 additions & 2 deletions scRNA-seq-advanced/setup/ewing-sarcoma/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,12 @@ The shell script in this directory downloads processed `SingleCellExperiment` ob

You may want to run it with your OpenScPCA conda environment activated.

By default, it will use an AWS profile called `openscpca` and download data from the `2024-08-22` OpenScPCA release.
By default, it will use your currently active AWS profile (falling back to one called `openscpca` if not set) and download data from the `2024-11-25` OpenScPCA release.

You can alter the AWS profile or release with the following:

```sh
PROFILE={profile} RELEASE={release} ./download-openscpca-data.sh
AWS_PROFILE={profile} RELEASE={release} ./download-openscpca-data.sh
```

Replacing `{profile}` and `{release}` with a profile with OpenScPCA access and valid release, respectively.
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@

set -euo pipefail

# Profile to use with the download script
PROFILE=${PROFILE:-openscpca}
# AWS profile to use with the download script
AWS_PROFILE=${AWS_PROFILE:-openscpca}
# Release to download
RELEASE=${RELEASE:-2024-11-25}

Expand Down Expand Up @@ -31,15 +31,15 @@ chmod +x download-data.py
--format SCE \
--release ${RELEASE} \
--data-dir ${ewing_data_dir} \
--profile ${PROFILE}
--profile ${AWS_PROFILE}

# # Download Ewing sarcoma metadata
./download-data.py \
--projects SCPCP000015 \
--metadata-only \
--release ${RELEASE} \
--data-dir ${ewing_data_dir} \
--profile ${PROFILE}
--profile ${AWS_PROFILE}

# Remove existing files from processed directory
if [ -z "$( ls -A ${processed_dir} )" ]; then
Expand Down
13 changes: 13 additions & 0 deletions scRNA-seq-advanced/setup/wilms-tumor/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
The shell script in this directory downloads processed `SingleCellExperiment` object for the `SCPCS000203` sample from `SCPCP000006` (Wilms tumor) using the data download mechanism from OpenScPCA.

You may want to run it with your OpenScPCA conda environment activated.

By default, it will use your currently active AWS profile (falling back to one called `openscpca` if not set) and download data from the `2024-11-25` OpenScPCA release.

You can alter the AWS profile or release with the following:

```sh
AWS_PROFILE={profile} RELEASE={release} ./download-openscpca-data.sh
```

Replacing `{profile}` and `{release}` with a profile with OpenScPCA access and valid release, respectively.
50 changes: 50 additions & 0 deletions scRNA-seq-advanced/setup/wilms-tumor/download-openscpca-data.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
#!/bin/bash

set -euo pipefail

# AWS profile to use with the download script
AWS_PROFILE=${AWS_PROFILE:-openscpca}
# Release to download
RELEASE=${RELEASE:-2024-11-25}

# Set the working directory to the directory of this file
cd "$(dirname "${BASH_SOURCE[0]}")"

# Define ScPCA project id
project_id="SCPCP000006"

# Set up directories
wilms_data_dir="../../data/wilms-tumor"
processed_dir="${wilms_data_dir}/processed"

# Create directories if they don't exist yet
mkdir -p "${processed_dir}"

# Get download data script from OpenScPCA for convenience
curl -O \
-L https://raw.githubusercontent.com/AlexsLemonade/OpenScPCA-analysis/a3d8a2c9144e8edb3894a7beeb89cdc6c3e6d681/download-data.py
# Make executable
chmod +x download-data.py

# Download Wilms tumor samples
./download-data.py \
--samples 'SCPCS000203' \
--format SCE \
--release ${RELEASE} \
--data-dir ${wilms_data_dir} \
--profile ${AWS_PROFILE}

# Remove existing files from processed directory
if [ -z "$( ls -A ${processed_dir} )" ]; then
echo "No processed files yet!"
else
rm -r ${processed_dir}/*
fi

# Move processed files from release folder into processed_dir
mv ${wilms_data_dir}/${RELEASE}/${project_id}/* ${processed_dir}

# Clean up download data script
rm download-data.py
# Clean up the remnants of download structure
rm -r ${wilms_data_dir}/${AWS_PROFILE}
sjspielman marked this conversation as resolved.
Show resolved Hide resolved
2 changes: 2 additions & 0 deletions scripts/link-data.sh
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ mkdir -p scRNA-seq-advanced/data/rms/integrated
mkdir -p scRNA-seq-advanced/data/rms/annotations
mkdir -p scRNA-seq-advanced/data/pancreas
mkdir -p scRNA-seq-advanced/data/hodgkins
mkdir -p scRNA-seq-advanced/data/wilms-tumor/processed

# Machine learning module directory
mkdir -p machine-learning/data
Expand Down Expand Up @@ -94,6 +95,7 @@ link_locs=(
scRNA-seq-advanced/data/rms/annotations/rms_sample_metadata.tsv
scRNA-seq-advanced/data/pancreas/processed
scRNA-seq-advanced/gene-sets
scRNA-seq-advanced/data/wilms-tumor/processed
machine-learning/data/open-pbta
pathway-analysis/data/leukemia
pathway-analysis/data/medulloblastoma
Expand Down
1 change: 1 addition & 0 deletions scripts/syncup-s3.sh
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ sync_dirs=(
scRNA-seq-advanced/data/rms/integrated
scRNA-seq-advanced/data/pancreas/processed
scRNA-seq-advanced/gene-sets
scRNA-seq-advanced/data/wilms-tumor/processed
machine-learning/data/open-pbta/processed
pathway-analysis/data/leukemia
pathway-analysis/data/medulloblastoma
Expand Down
Loading