Skip to content

Commit

Permalink
Merge pull request #17 from sib-swiss/16-use-project-as-name-for-proj…
Browse files Browse the repository at this point in the history
…ect-directory

fixes #16
  • Loading branch information
GeertvanGeest authored Nov 20, 2023
2 parents 45dc947 + 72f588e commit 4845323
Show file tree
Hide file tree
Showing 32 changed files with 115 additions and 115 deletions.
20 changes: 10 additions & 10 deletions docs/day1/quality_control.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,12 +47,12 @@ Now we will use some bioinformatics tools to do download reads and perform quali
conda activate ngs-tools
```

Make a directory `reads` in `~/workdir` and download the reads from the SRA database using `prefetch` and `fastq-dump` from [SRA-Tools](https://ncbi.github.io/sra-tools/) into the `reads` directory. Use the code snippet below to create a scripts called `01_download_reads.sh`. Store it in `~/workdir/scripts/`, and run it.
Make a directory `reads` in `~/project` and download the reads from the SRA database using `prefetch` and `fastq-dump` from [SRA-Tools](https://ncbi.github.io/sra-tools/) into the `reads` directory. Use the code snippet below to create a scripts called `01_download_reads.sh`. Store it in `~/project/scripts/`, and run it.

```sh title="01_download_reads.sh"
#!/usr/bin/env bash

cd ~/workdir
cd ~/project
mkdir reads
cd reads
prefetch SRR519926
Expand Down Expand Up @@ -83,11 +83,11 @@ fastq-dump --split-files SRR519926
`fastqc` accepts multiple files as input, so you can use a [wildcard](https://en.wikipedia.org/wiki/Glob_(programming)) to run `fastqc` on all the files in one line of code. Use it like this: `*.fastq`.

??? done "Answer"
Your script `~/workdir/scripts/02_run_fastqc.sh` should look like:
Your script `~/project/scripts/02_run_fastqc.sh` should look like:

```sh title="02_run_fastqc.sh"
#!/usr/bin/env bash
cd ~/workdir/reads
cd ~/project/reads

fastqc *.fastq
```
Expand Down Expand Up @@ -127,13 +127,13 @@ We will use [fastp](https://github.com/OpenGene/fastp) for trimming adapters and
- The minimum required length is also 15: `reads shorter than length_required will be discarded, default is 15. (int [=15])`
- If one of the reads does not meet the required length, the pair is discarded if `--unpaired1` and/or `--unpaired2` are not specified: `for PE input, if read1 passed QC but read2 not, it will be written to unpaired1. Default is to discard it. (string [=])`.

**Exercise:** Complete the script below called `03_trim_reads.sh` (replace everything in between brackets `[]`) to run `fastp` to trim the data. The quality of our dataset is not great, so we will overwrite the defaults. Use a a minimum qualified base quality of 10, set the maximum percentage of unqalified bases to 80% and a minimum read length of 25. Note that a new directory called `~/workdir/results/trimmed/` is created to write the trimmed reads.
**Exercise:** Complete the script below called `03_trim_reads.sh` (replace everything in between brackets `[]`) to run `fastp` to trim the data. The quality of our dataset is not great, so we will overwrite the defaults. Use a a minimum qualified base quality of 10, set the maximum percentage of unqalified bases to 80% and a minimum read length of 25. Note that a new directory called `~/project/results/trimmed/` is created to write the trimmed reads.

```sh title="03_trim_reads.sh"
#!/usr/bin/env bash

TRIMMED_DIR=~/workdir/results/trimmed
READS_DIR=~/workdir/reads
TRIMMED_DIR=~/project/results/trimmed
READS_DIR=~/project/reads

mkdir -p $TRIMMED_DIR

Expand All @@ -156,13 +156,13 @@ fastp \
Note that we have set the options `--cut_front` and `--cut_tail` that will ensure low quality bases are trimmed in a sliding window from both the 5' and 3' ends. Also `--detect_adapter_for_pe` is set, which ensures that adapters are detected automatically for both R1 and R2.

??? done "Answer"
Your script (`~/workdir/scripts/03_trim_reads.sh`) should look like this:
Your script (`~/project/scripts/03_trim_reads.sh`) should look like this:

```sh title="03_trim_reads.sh"
#!/usr/bin/env bash

TRIMMED_DIR=~/workdir/results/trimmed
READS_DIR=~/workdir/reads
TRIMMED_DIR=~/project/results/trimmed
READS_DIR=~/project/reads

mkdir -p $TRIMMED_DIR

Expand Down
2 changes: 1 addition & 1 deletion docs/day1/reproducibility.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ During today and tomorrow we will work with a small *E. coli* dataset to practic

By adhering to these simple principles it will be relatively straightforward to re-do your analysis steps only based on the scripts, and will get you started to adhere to the [Ten Simple Rules for Reproducible Computational Research](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003285).

By the end of day 2 `~/workdir` should look (something) like this:
By the end of day 2 `~/project` should look (something) like this:

```
.
Expand Down
16 changes: 8 additions & 8 deletions docs/day1/server_login.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,14 +80,14 @@
-p 8443:8443 \
-e PUID=1000 \
-e PGID=1000 \
-e DEFAULT_WORKSPACE=/config/workdir \
-v $PWD:/config/workdir \
-e DEFAULT_WORKSPACE=/config/project \
-v $PWD:/config/project \
geertvangeest/ngs-introduction-vscode:latest
```

If this command has run successfully, navigate in your browser to [http://localhost:8443](http://localhost:8443).

The option `-v` mounts a local directory in your computer to the directory `/config/workdir` in the docker container. In that way, you have files available both in the container and on your computer. Use this directory on your computer to e.g. visualise data with IGV. Change the first path to a path on your computer that you want to use as a working directory.
The option `-v` mounts a local directory in your computer to the directory `/config/project` in the docker container. In that way, you have files available both in the container and on your computer. Use this directory on your computer to e.g. visualise data with IGV. Change the first path to a path on your computer that you want to use as a working directory.

!!! note "Don't mount directly in the home dir"
Don't directly mount your local directory to the home directory (`/root`). This will lead to unexpected behaviour.
Expand Down Expand Up @@ -177,18 +177,18 @@ If you need some reminders of the commands, here's a link to a UNIX command line

#### Make a new directory

Make a directory `scripts` within `~/workdir` and make it your current directory.
Make a directory `scripts` within `~/project` and make it your current directory.

??? done "Answer"
```sh
cd ~/workdir
cd ~/project
mkdir scripts
cd scripts
```

#### File permissions

Generate an empty script in your newly made directory `~/workdir/scripts` like this:
Generate an empty script in your newly made directory `~/project/scripts` like this:

```sh
touch new_script.sh
Expand Down Expand Up @@ -266,7 +266,7 @@ In the root directory (go there like this: `cd /`) there are a range of system d

??? done "Answer"
```sh
ls / > ~/workdir/system_dirs.txt
ls / > ~/project/system_dirs.txt
```

The command `wc -l` counts the number of lines, and can read from stdin. Make a one-liner with a pipe `|` symbol to find out how many system directories and files there are.
Expand All @@ -282,7 +282,7 @@ Store `system_dirs.txt` as variable (like this: `VAR=variable`), and use `wc -l`

??? done "Answer"
```sh
FILE=~/workdir/system_dirs.txt
FILE=~/project/system_dirs.txt
wc -l $FILE
```

Expand Down
10 changes: 5 additions & 5 deletions docs/day2/read_alignment.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Make a script called `05_download_ecoli_reference.sh`, and paste in the code sni
```sh title="05_download_ecoli_reference.sh"
#!/usr/bin/env bash

REFERENCE_DIR=~/workdir/ref_genome/
REFERENCE_DIR=~/project/ref_genome/

mkdir $REFERENCE_DIR
cd $REFERENCE_DIR
Expand All @@ -41,7 +41,7 @@ esearch -db nuccore -query 'U00096' \
```sh title="06_build_bowtie_index.sh"
#!/usr/bin/env bash

cd ~/workdir/ref_genome
cd ~/project/ref_genome

bowtie2-build ecoli-strK12-MG1655.fasta ecoli-strK12-MG1655.fasta
```
Expand All @@ -66,9 +66,9 @@ esearch -db nuccore -query 'U00096' \
```sh title="07_align_reads.sh"
#!/usr/bin/env bash

TRIMMED_DIR=~/workdir/results/trimmed
REFERENCE_DIR=~/workdir/ref_genome/
ALIGNED_DIR=~/workdir/results/alignments
TRIMMED_DIR=~/project/results/trimmed
REFERENCE_DIR=~/project/ref_genome/
ALIGNED_DIR=~/project/results/alignments

mkdir -p $ALIGNED_DIR

Expand Down
24 changes: 12 additions & 12 deletions docs/day2/samtools.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
??? done "Answer"
Code:
```sh
cd ~/workdir/results/alignments/
cd ~/project/results/alignments/
samtools flagstat SRR519926.sam > SRR519926.sam.stats
```

Expand Down Expand Up @@ -74,7 +74,7 @@ The command `samtools view` is very versatile. It takes an alignment file and wr
```sh title="08_compress_sort.sh"
#!/usr/bin/env bash

cd ~/workdir/results/alignments
cd ~/project/results/alignments

samtools view -bh SRR519926.sam > SRR519926.bam
```
Expand All @@ -96,7 +96,7 @@ samtools index SRR519926.sorted.bam
```sh title="08_compress_sort.sh"
#!/usr/bin/env bash

cd ~/workdir/results/alignments
cd ~/project/results/alignments

samtools view -bh SRR519926.sam > SRR519926.bam
samtools sort SRR519926.bam > SRR519926.sorted.bam
Expand All @@ -108,7 +108,7 @@ samtools index SRR519926.sorted.bam
```
@HD VN:1.0 SO:unsorted
@SQ SN:U00096.3 LN:4641652
@PG ID:bowtie2 PN:bowtie2 VN:2.4.2 CL:"/opt/conda/envs/ngs-tools/bin/bowtie2-align-s --wrapper basic-0 -x /config/workdir/ref_genome//ecoli-strK12-MG1655.fasta -1 /config/workdir/trimmed_data/trimmed_SRR519926_1.fastq -2 /config/workdir/trimmed_data/trimmed_SRR519926_2.fastq"
@PG ID:bowtie2 PN:bowtie2 VN:2.4.2 CL:"/opt/conda/envs/ngs-tools/bin/bowtie2-align-s --wrapper basic-0 -x /config/project/ref_genome//ecoli-strK12-MG1655.fasta -1 /config/project/trimmed_data/trimmed_SRR519926_1.fastq -2 /config/project/trimmed_data/trimmed_SRR519926_2.fastq"
@PG ID:samtools PN:samtools PP:bowtie2 VN:1.12 CL:samtools view -bh SRR519926.sam
@PG ID:samtools.1 PN:samtools PP:samtools VN:1.12 CL:samtools view -H SRR519926.bam
```
Expand All @@ -118,7 +118,7 @@ samtools index SRR519926.sorted.bam
```
@HD VN:1.0 SO:coordinate
@SQ SN:U00096.3 LN:4641652
@PG ID:bowtie2 PN:bowtie2 VN:2.4.2 CL:"/opt/conda/envs/ngs-tools/bin/bowtie2-align-s --wrapper basic-0 -x /config/workdir/ref_genome//ecoli-strK12-MG1655.fasta -1 /config/workdir/trimmed_data/trimmed_SRR519926_1.fastq -2 /config/workdir/trimmed_data/trimmed_SRR519926_2.fastq"
@PG ID:bowtie2 PN:bowtie2 VN:2.4.2 CL:"/opt/conda/envs/ngs-tools/bin/bowtie2-align-s --wrapper basic-0 -x /config/project/ref_genome//ecoli-strK12-MG1655.fasta -1 /config/project/trimmed_data/trimmed_SRR519926_1.fastq -2 /config/project/trimmed_data/trimmed_SRR519926_2.fastq"
@PG ID:samtools PN:samtools PP:bowtie2 VN:1.12 CL:samtools view -bh SRR519926.sam
@PG ID:samtools.1 PN:samtools PP:samtools VN:1.12 CL:samtools sort SRR519926.bam
@PG ID:samtools.2 PN:samtools PP:samtools.1 VN:1.12 CL:samtools view -H SRR519926.sorted.bam
Expand Down Expand Up @@ -163,7 +163,7 @@ samtools view -bh -F 4 SRR519926.sorted.bam > SRR519926.sorted.mapped.bam
```sh title="09_extract_unmapped.sh"
#!/usr/bin/env bash

cd ~/workdir/results/alignments
cd ~/project/results/alignments

samtools view -bh -f 0x4 SRR519926.sorted.bam > SRR519926.sorted.unmapped.bam
```
Expand All @@ -183,7 +183,7 @@ samtools view -bh -F 4 SRR519926.sorted.bam > SRR519926.sorted.mapped.bam
Our E. coli genome has only one chromosome, because only one line starts with `>` in the fasta file

```sh
cd ~/workdir/ref_genome
cd ~/project/ref_genome
grep ">" ecoli-strK12-MG1655.fasta
```

Expand All @@ -200,7 +200,7 @@ samtools view -bh -F 4 SRR519926.sorted.bam > SRR519926.sorted.mapped.bam
```sh title="10_extract_region.sh"
#!/usr/bin/env bash

cd ~/workdir/results/alignments
cd ~/project/results/alignments

samtools view -bh \
SRR519926.sorted.bam \
Expand Down Expand Up @@ -229,9 +229,9 @@ my_alignment_command \
```sh title="11_align_sort.sh"
#!/usr/bin/env bash

TRIMMED_DIR=~/workdir/results/trimmed
REFERENCE_DIR=~/workdir/ref_genome
ALIGNED_DIR=~/workdir/results/alignments
TRIMMED_DIR=~/project/results/trimmed
REFERENCE_DIR=~/project/ref_genome
ALIGNED_DIR=~/project/results/alignments

bowtie2 \
-x $REFERENCE_DIR/ecoli-strK12-MG1655.fasta \
Expand All @@ -250,4 +250,4 @@ my_alignment_command \

The software [MultiQC](https://multiqc.info/) is great for creating summaries out of log files and reports from many different bioinformatic tools (including `fastqc`, `fastp`, `samtools` and `bowtie2`). You can specify a directory that contains any log files, and it will automatically search it for you.

**Exercise**: Run the command `multiqc .` in `~/workdir` and checkout the generated report.
**Exercise**: Run the command `multiqc .` in `~/project` and checkout the generated report.
4 changes: 2 additions & 2 deletions docs/day3/igv_visualisation.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,13 +20,13 @@ The exercises below are partly based on [this tutorial](https://github.com/griff
Index the alignment that was filtered for the region between 2000 and 2500 kb:

```sh
cd ~/workdir/results/alignments
cd ~/project/results/alignments
samtools index SRR519926.sorted.region.bam
```
Download it together with it's index file (`SRR519926.sorted.region.bam.bai`) and the reference genome (`ecoli-strK12-MG1655.fasta`) to your desktop.

!!! note "If working with Docker"
If you are working with Docker, you can find the files in the working directory that you mounted to the docker container (with the `-v` option). So if you have used `-v C:\Users\myusername\ngs-course:/root/workdir`, your files will be in `C:\Users\myusername\ngs-course`.
If you are working with Docker, you can find the files in the working directory that you mounted to the docker container (with the `-v` option). So if you have used `-v C:\Users\myusername\ngs-course:/root/project`, your files will be in `C:\Users\myusername\ngs-course`.

* Load the genome (`.fasta`) into IGV: **Genomes > Load Genome from File...**
* Load the alignment file (`.bam`): **File > Load from File...**
Expand Down
2 changes: 1 addition & 1 deletion docs/group_work.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ In the afternoon of day 1, you will start on the project. On day 3, you can work
Each group has access to a shared working directory. It is mounted in the root directory (`/`). Make a soft link in your home directory:

```sh
cd ~/workdir
cd ~/project
ln -s /group_work/GROUP_NAME/ ./
# replace [GROUP_NAME] with your group directory
```
Expand Down
2 changes: 1 addition & 1 deletion scripts/exercises/01_download_reads.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/usr/bin/env bash

cd ~/workdir
cd ~/project
mkdir reads
cd reads
prefetch SRR519926
Expand Down
2 changes: 1 addition & 1 deletion scripts/exercises/02_run_fastqc.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#!/usr/bin/env bash
cd ~/workdir/reads
cd ~/project/reads

fastqc *.fastq
4 changes: 2 additions & 2 deletions scripts/exercises/03_trim_reads.sh
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#!/usr/bin/env bash

TRIMMED_DIR=~/workdir/results/trimmed
READS_DIR=~/workdir/reads
TRIMMED_DIR=~/project/results/trimmed
READS_DIR=~/project/reads

mkdir -p $TRIMMED_DIR

Expand Down
2 changes: 1 addition & 1 deletion scripts/exercises/04_run_fastqc_trimmed.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#!/usr/bin/env bash

cd ~/workdir/results/trimmed
cd ~/project/results/trimmed
fastqc trimmed*.fastq
2 changes: 1 addition & 1 deletion scripts/exercises/05_download_ecoli_reference.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/usr/bin/env bash

REFERENCE_DIR=~/workdir/ref_genome/
REFERENCE_DIR=~/project/ref_genome/

mkdir $REFERENCE_DIR
cd $REFERENCE_DIR
Expand Down
2 changes: 1 addition & 1 deletion scripts/exercises/06_build_bowtie_index.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/usr/bin/env bash

cd ~/workdir/ref_genome
cd ~/project/ref_genome

bowtie2-build ecoli-strK12-MG1655.fasta ecoli-strK12-MG1655.fasta
6 changes: 3 additions & 3 deletions scripts/exercises/07_align_reads.sh
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
#!/usr/bin/env bash

TRIMMED_DIR=~/workdir/results/trimmed
REFERENCE_DIR=~/workdir/ref_genome/
ALIGNED_DIR=~/workdir/results/alignments
TRIMMED_DIR=~/project/results/trimmed
REFERENCE_DIR=~/project/ref_genome/
ALIGNED_DIR=~/project/results/alignments

mkdir -p $ALIGNED_DIR

Expand Down
2 changes: 1 addition & 1 deletion scripts/exercises/08_compress_sort.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/usr/bin/env bash

cd ~/workdir/results/alignments
cd ~/project/results/alignments

samtools view -bh SRR519926.sam > SRR519926.bam
samtools sort SRR519926.bam > SRR519926.sorted.bam
Expand Down
2 changes: 1 addition & 1 deletion scripts/exercises/09_extract_unmapped.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/usr/bin/env bash

cd ~/workdir/results/alignments
cd ~/project/results/alignments

samtools view -bh -f 0x4 SRR519926.sorted.bam > SRR519926.sorted.unmapped.bam
2 changes: 1 addition & 1 deletion scripts/exercises/10_extract_region.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/usr/bin/env bash

cd ~/workdir/results/alignments
cd ~/project/results/alignments

samtools view -bh \
SRR519926.sorted.bam \
Expand Down
6 changes: 3 additions & 3 deletions scripts/exercises/11_align_sort_filter.sh
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
#!/usr/bin/env bash

TRIMMED_DIR=~/workdir/results/trimmed
REFERENCE_DIR=~/workdir/ref_genome
ALIGNED_DIR=~/workdir/results/alignments
TRIMMED_DIR=~/project/results/trimmed
REFERENCE_DIR=~/project/ref_genome
ALIGNED_DIR=~/project/results/alignments

bowtie2 \
-x $REFERENCE_DIR/ecoli-strK12-MG1655.fasta \
Expand Down
6 changes: 3 additions & 3 deletions scripts/project1/01_download_reads.sh
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
#!/usr/bin/env bash

WORKDIR=/config/workdir/projects/project1
mkdir -p "$WORKDIR"
cd "$WORKDIR"
PROJDIR=/config/project/projects/project1
mkdir -p "$PROJDIR"
cd "$PROJDIR"

wget https://ngs-introduction-training.s3.eu-central-1.amazonaws.com/project1.tar.gz
tar -xvf project1.tar.gz
Expand Down
4 changes: 2 additions & 2 deletions scripts/project1/02_run_fastqc.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/usr/bin/env bash

WORKDIR=/config/workdir/projects/project1
cd "$WORKDIR"/data/fastq
PROJDIR=/config/project/projects/project1
cd "$PROJDIR"/data/fastq

fastqc *.fastq.gz
Loading

0 comments on commit 4845323

Please sign in to comment.