Merge pull request #17 from sib-swiss/16-use-project-as-name-for-proj…

…ect-directory fixes #16
sib-swiss · Nov 20, 2023 · 4845323 · 4845323
2 parents 45dc947 + 72f588e
commit 4845323
Show file tree

Hide file tree

Showing 32 changed files with 115 additions and 115 deletions.
diff --git a/docs/day1/quality_control.md b/docs/day1/quality_control.md
@@ -47,12 +47,12 @@ Now we will use some bioinformatics tools to do download reads and perform quali
 conda activate ngs-tools
 ```
 
-Make a directory `reads` in `~/workdir` and download the reads from the SRA database using `prefetch` and `fastq-dump` from [SRA-Tools](https://ncbi.github.io/sra-tools/) into the `reads` directory. Use the code snippet below to create a scripts called `01_download_reads.sh`. Store it in `~/workdir/scripts/`, and run it.
+Make a directory `reads` in `~/project` and download the reads from the SRA database using `prefetch` and `fastq-dump` from [SRA-Tools](https://ncbi.github.io/sra-tools/) into the `reads` directory. Use the code snippet below to create a scripts called `01_download_reads.sh`. Store it in `~/project/scripts/`, and run it.
 
 ```sh title="01_download_reads.sh"
 #!/usr/bin/env bash
 
-cd ~/workdir
+cd ~/project
 mkdir reads
 cd reads
 prefetch SRR519926
@@ -83,11 +83,11 @@ fastq-dump --split-files SRR519926
     `fastqc` accepts multiple files as input, so you can use a [wildcard](https://en.wikipedia.org/wiki/Glob_(programming)) to run `fastqc` on all the files in one line of code. Use it like this: `*.fastq`.  
 
 ??? done "Answer"
-    Your script `~/workdir/scripts/02_run_fastqc.sh` should look like:
+    Your script `~/project/scripts/02_run_fastqc.sh` should look like:
 
     ```sh title="02_run_fastqc.sh"
     #!/usr/bin/env bash
-    cd ~/workdir/reads
+    cd ~/project/reads
 
     fastqc *.fastq
     ```
@@ -127,13 +127,13 @@ We will use [fastp](https://github.com/OpenGene/fastp) for trimming adapters and
     - The minimum required length is also 15: `reads shorter than length_required will be discarded, default is 15. (int [=15])`
     - If one of the reads does not meet the required length, the pair is discarded if `--unpaired1` and/or `--unpaired2` are not specified: `for PE input, if read1 passed QC but read2 not, it will be written to unpaired1. Default is to discard it. (string [=])`. 
 
-**Exercise:** Complete the script below called `03_trim_reads.sh` (replace everything in between brackets `[]`) to run `fastp` to trim the data.  The quality of our dataset is not great, so we will overwrite the defaults.  Use a a minimum qualified base quality of 10, set the maximum percentage of unqalified bases to 80% and a minimum read length of 25. Note that a new directory called `~/workdir/results/trimmed/` is created to write the trimmed reads.
+**Exercise:** Complete the script below called `03_trim_reads.sh` (replace everything in between brackets `[]`) to run `fastp` to trim the data.  The quality of our dataset is not great, so we will overwrite the defaults.  Use a a minimum qualified base quality of 10, set the maximum percentage of unqalified bases to 80% and a minimum read length of 25. Note that a new directory called `~/project/results/trimmed/` is created to write the trimmed reads.
 
 ```sh title="03_trim_reads.sh"
 #!/usr/bin/env bash
 
-TRIMMED_DIR=~/workdir/results/trimmed
-READS_DIR=~/workdir/reads
+TRIMMED_DIR=~/project/results/trimmed
+READS_DIR=~/project/reads
 
 mkdir -p $TRIMMED_DIR
 
@@ -156,13 +156,13 @@ fastp \
     Note that we have set the options `--cut_front` and `--cut_tail` that will ensure low quality bases are trimmed in a sliding window from both the 5' and 3' ends. Also `--detect_adapter_for_pe` is set, which ensures that adapters are detected automatically for both R1 and R2. 
 
 ??? done "Answer"
-    Your script (`~/workdir/scripts/03_trim_reads.sh`) should look like this:
+    Your script (`~/project/scripts/03_trim_reads.sh`) should look like this:
 
     ```sh title="03_trim_reads.sh"
     #!/usr/bin/env bash
 
-    TRIMMED_DIR=~/workdir/results/trimmed
-    READS_DIR=~/workdir/reads
+    TRIMMED_DIR=~/project/results/trimmed
+    READS_DIR=~/project/reads
 
     mkdir -p $TRIMMED_DIR
 

diff --git a/docs/day1/reproducibility.md b/docs/day1/reproducibility.md
@@ -21,7 +21,7 @@ During today and tomorrow we will work with a small *E. coli* dataset to practic
 
 By adhering to these simple principles it will be relatively straightforward to re-do your analysis steps only based on the scripts, and will get you started to adhere to the [Ten Simple Rules for Reproducible Computational Research](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003285). 
 
-By the end of day 2 `~/workdir` should look (something) like this:
+By the end of day 2 `~/project` should look (something) like this:
 
 ```
 .

diff --git a/docs/day1/server_login.md b/docs/day1/server_login.md
@@ -80,14 +80,14 @@
     -p 8443:8443 \
     -e PUID=1000 \
     -e PGID=1000 \
-    -e DEFAULT_WORKSPACE=/config/workdir \
-    -v $PWD:/config/workdir \
+    -e DEFAULT_WORKSPACE=/config/project \
+    -v $PWD:/config/project \
     geertvangeest/ngs-introduction-vscode:latest
     ```
 
     If this command has run successfully, navigate in your browser to [http://localhost:8443](http://localhost:8443).
 
-    The option `-v` mounts a local directory in your computer to the directory `/config/workdir` in the docker container. In that way, you have files available both in the container and on your computer. Use this directory on your computer to e.g. visualise data with IGV. Change the first path to a path on your computer that you want to use as a working directory.
+    The option `-v` mounts a local directory in your computer to the directory `/config/project` in the docker container. In that way, you have files available both in the container and on your computer. Use this directory on your computer to e.g. visualise data with IGV. Change the first path to a path on your computer that you want to use as a working directory.
 
     !!! note "Don't mount directly in the home dir"
         Don't directly mount your local directory to the home directory (`/root`). This will lead to unexpected behaviour.
@@ -177,18 +177,18 @@ If you need some reminders of the commands, here's a link to a UNIX command line
 
 #### Make a new directory
 
-Make a directory `scripts` within `~/workdir` and make it your current directory.
+Make a directory `scripts` within `~/project` and make it your current directory.
 
 ??? done "Answer"
     ```sh
-    cd ~/workdir
+    cd ~/project
     mkdir scripts
     cd scripts
     ```
 
 #### File permissions
 
-Generate an empty script in your newly made directory `~/workdir/scripts` like this:
+Generate an empty script in your newly made directory `~/project/scripts` like this:
 
 ```sh
 touch new_script.sh
@@ -266,7 +266,7 @@ In the root directory (go there like this: `cd /`) there are a range of system d
 
 ??? done "Answer"
     ```sh
-    ls / > ~/workdir/system_dirs.txt
+    ls / > ~/project/system_dirs.txt
     ```
 
 The command `wc -l` counts the number of lines, and can read from stdin. Make a one-liner with a pipe `|` symbol to find out how many system directories and files there are.
@@ -282,7 +282,7 @@ Store `system_dirs.txt` as variable (like this: `VAR=variable`), and use `wc -l`
 
 ??? done "Answer"
     ```sh
-    FILE=~/workdir/system_dirs.txt
+    FILE=~/project/system_dirs.txt
     wc -l $FILE
     ```
 

diff --git a/docs/day2/read_alignment.md b/docs/day2/read_alignment.md
@@ -26,7 +26,7 @@ Make a script called `05_download_ecoli_reference.sh`, and paste in the code sni
 ```sh title="05_download_ecoli_reference.sh"
 #!/usr/bin/env bash
 
-REFERENCE_DIR=~/workdir/ref_genome/
+REFERENCE_DIR=~/project/ref_genome/
 
 mkdir $REFERENCE_DIR
 cd $REFERENCE_DIR
@@ -41,7 +41,7 @@ esearch -db nuccore -query 'U00096' \
     ```sh title="06_build_bowtie_index.sh"
     #!/usr/bin/env bash
 
-    cd ~/workdir/ref_genome
+    cd ~/project/ref_genome
 
     bowtie2-build ecoli-strK12-MG1655.fasta ecoli-strK12-MG1655.fasta
     ```
@@ -66,9 +66,9 @@ esearch -db nuccore -query 'U00096' \
 ```sh title="07_align_reads.sh"
 #!/usr/bin/env bash
 
-TRIMMED_DIR=~/workdir/results/trimmed
-REFERENCE_DIR=~/workdir/ref_genome/
-ALIGNED_DIR=~/workdir/results/alignments
+TRIMMED_DIR=~/project/results/trimmed
+REFERENCE_DIR=~/project/ref_genome/
+ALIGNED_DIR=~/project/results/alignments
 
 mkdir -p $ALIGNED_DIR
 

diff --git a/docs/day2/samtools.md b/docs/day2/samtools.md
@@ -27,7 +27,7 @@
 ??? done "Answer"
     Code:
     ```sh
-    cd ~/workdir/results/alignments/
+    cd ~/project/results/alignments/
     samtools flagstat SRR519926.sam > SRR519926.sam.stats
     ```
 
@@ -74,7 +74,7 @@ The command `samtools view` is very versatile. It takes an alignment file and wr
     ```sh title="08_compress_sort.sh"
     #!/usr/bin/env bash
 
-    cd ~/workdir/results/alignments
+    cd ~/project/results/alignments
 
     samtools view -bh SRR519926.sam > SRR519926.bam
     ```
@@ -96,7 +96,7 @@ samtools index SRR519926.sorted.bam
     ```sh title="08_compress_sort.sh"
     #!/usr/bin/env bash
 
-    cd ~/workdir/results/alignments
+    cd ~/project/results/alignments
 
     samtools view -bh SRR519926.sam > SRR519926.bam
     samtools sort SRR519926.bam > SRR519926.sorted.bam
@@ -108,7 +108,7 @@ samtools index SRR519926.sorted.bam
     ```
     @HD     VN:1.0  SO:unsorted
     @SQ     SN:U00096.3     LN:4641652
-    @PG     ID:bowtie2      PN:bowtie2      VN:2.4.2        CL:"/opt/conda/envs/ngs-tools/bin/bowtie2-align-s --wrapper basic-0 -x /config/workdir/ref_genome//ecoli-strK12-MG1655.fasta -1 /config/workdir/trimmed_data/trimmed_SRR519926_1.fastq -2 /config/workdir/trimmed_data/trimmed_SRR519926_2.fastq"
+    @PG     ID:bowtie2      PN:bowtie2      VN:2.4.2        CL:"/opt/conda/envs/ngs-tools/bin/bowtie2-align-s --wrapper basic-0 -x /config/project/ref_genome//ecoli-strK12-MG1655.fasta -1 /config/project/trimmed_data/trimmed_SRR519926_1.fastq -2 /config/project/trimmed_data/trimmed_SRR519926_2.fastq"
     @PG     ID:samtools     PN:samtools     PP:bowtie2      VN:1.12 CL:samtools view -bh SRR519926.sam
     @PG     ID:samtools.1   PN:samtools     PP:samtools     VN:1.12 CL:samtools view -H SRR519926.bam
     ```
@@ -118,7 +118,7 @@ samtools index SRR519926.sorted.bam
     ```
     @HD     VN:1.0  SO:coordinate
     @SQ     SN:U00096.3     LN:4641652
-    @PG     ID:bowtie2      PN:bowtie2      VN:2.4.2        CL:"/opt/conda/envs/ngs-tools/bin/bowtie2-align-s --wrapper basic-0 -x /config/workdir/ref_genome//ecoli-strK12-MG1655.fasta -1 /config/workdir/trimmed_data/trimmed_SRR519926_1.fastq -2 /config/workdir/trimmed_data/trimmed_SRR519926_2.fastq"
+    @PG     ID:bowtie2      PN:bowtie2      VN:2.4.2        CL:"/opt/conda/envs/ngs-tools/bin/bowtie2-align-s --wrapper basic-0 -x /config/project/ref_genome//ecoli-strK12-MG1655.fasta -1 /config/project/trimmed_data/trimmed_SRR519926_1.fastq -2 /config/project/trimmed_data/trimmed_SRR519926_2.fastq"
     @PG     ID:samtools     PN:samtools     PP:bowtie2      VN:1.12 CL:samtools view -bh SRR519926.sam
     @PG     ID:samtools.1   PN:samtools     PP:samtools     VN:1.12 CL:samtools sort SRR519926.bam
     @PG     ID:samtools.2   PN:samtools     PP:samtools.1   VN:1.12 CL:samtools view -H SRR519926.sorted.bam
@@ -163,7 +163,7 @@ samtools view -bh -F 4 SRR519926.sorted.bam > SRR519926.sorted.mapped.bam
     ```sh title="09_extract_unmapped.sh"
     #!/usr/bin/env bash
 
-    cd ~/workdir/results/alignments
+    cd ~/project/results/alignments
 
     samtools view -bh -f 0x4 SRR519926.sorted.bam > SRR519926.sorted.unmapped.bam
     ```
@@ -183,7 +183,7 @@ samtools view -bh -F 4 SRR519926.sorted.bam > SRR519926.sorted.mapped.bam
     Our E. coli genome has only one chromosome, because only one line starts with `>` in the fasta file
 
     ```sh
-    cd ~/workdir/ref_genome
+    cd ~/project/ref_genome
     grep ">" ecoli-strK12-MG1655.fasta
     ```
 
@@ -200,7 +200,7 @@ samtools view -bh -F 4 SRR519926.sorted.bam > SRR519926.sorted.mapped.bam
     ```sh title="10_extract_region.sh"
     #!/usr/bin/env bash
 
-    cd ~/workdir/results/alignments
+    cd ~/project/results/alignments
 
     samtools view -bh \
     SRR519926.sorted.bam \
@@ -229,9 +229,9 @@ my_alignment_command \
     ```sh title="11_align_sort.sh"
     #!/usr/bin/env bash
 
-    TRIMMED_DIR=~/workdir/results/trimmed
-    REFERENCE_DIR=~/workdir/ref_genome
-    ALIGNED_DIR=~/workdir/results/alignments
+    TRIMMED_DIR=~/project/results/trimmed
+    REFERENCE_DIR=~/project/ref_genome
+    ALIGNED_DIR=~/project/results/alignments
 
     bowtie2 \
     -x $REFERENCE_DIR/ecoli-strK12-MG1655.fasta \
@@ -250,4 +250,4 @@ my_alignment_command \
 
 The software [MultiQC](https://multiqc.info/) is great for creating summaries out of log files and reports from many different bioinformatic tools (including `fastqc`, `fastp`, `samtools` and `bowtie2`). You can specify a directory that contains any log files, and it will automatically search it for you. 
 
-**Exercise**: Run the command `multiqc .` in `~/workdir` and checkout the generated report. 
+**Exercise**: Run the command `multiqc .` in `~/project` and checkout the generated report. 
diff --git a/docs/day3/igv_visualisation.md b/docs/day3/igv_visualisation.md
@@ -20,13 +20,13 @@ The exercises below are partly based on [this tutorial](https://github.com/griff
 Index the alignment that was filtered for the region between 2000 and 2500 kb:
 
 ```sh
-cd ~/workdir/results/alignments
+cd ~/project/results/alignments
 samtools index SRR519926.sorted.region.bam
 ```
 Download it together with it's index file (`SRR519926.sorted.region.bam.bai`) and the reference genome (`ecoli-strK12-MG1655.fasta`) to your desktop.
 
 !!! note "If working with Docker"
-    If you are working with Docker, you can find the files in the working directory that you mounted to the docker container (with the `-v` option). So if you have used `-v C:\Users\myusername\ngs-course:/root/workdir`, your files will be in `C:\Users\myusername\ngs-course`.
+    If you are working with Docker, you can find the files in the working directory that you mounted to the docker container (with the `-v` option). So if you have used `-v C:\Users\myusername\ngs-course:/root/project`, your files will be in `C:\Users\myusername\ngs-course`.
 
 * Load the genome (`.fasta`) into IGV: **Genomes > Load Genome from File...**
 * Load the alignment file (`.bam`): **File > Load from File...**

diff --git a/docs/group_work.md b/docs/group_work.md
@@ -24,7 +24,7 @@ In the afternoon of day 1, you will start on the project. On day 3, you can work
 Each group has access to a shared working directory. It is mounted in the root directory (`/`). Make a soft link in your home directory:
 
 ```sh
-cd ~/workdir
+cd ~/project
 ln -s /group_work/GROUP_NAME/ ./
 # replace [GROUP_NAME] with your group directory
 ```

diff --git a/scripts/exercises/01_download_reads.sh b/scripts/exercises/01_download_reads.sh
@@ -1,6 +1,6 @@
 #!/usr/bin/env bash
 
-cd ~/workdir
+cd ~/project
 mkdir reads
 cd reads
 prefetch SRR519926

diff --git a/scripts/exercises/02_run_fastqc.sh b/scripts/exercises/02_run_fastqc.sh
@@ -1,4 +1,4 @@
 #!/usr/bin/env bash
-cd ~/workdir/reads
+cd ~/project/reads
 
 fastqc *.fastq
diff --git a/scripts/exercises/03_trim_reads.sh b/scripts/exercises/03_trim_reads.sh
@@ -1,7 +1,7 @@
 #!/usr/bin/env bash
 
-TRIMMED_DIR=~/workdir/results/trimmed
-READS_DIR=~/workdir/reads
+TRIMMED_DIR=~/project/results/trimmed
+READS_DIR=~/project/reads
 
 mkdir -p $TRIMMED_DIR
 

diff --git a/scripts/exercises/04_run_fastqc_trimmed.sh b/scripts/exercises/04_run_fastqc_trimmed.sh
@@ -1,4 +1,4 @@
 #!/usr/bin/env bash
 
-cd ~/workdir/results/trimmed
+cd ~/project/results/trimmed
 fastqc trimmed*.fastq
diff --git a/scripts/exercises/05_download_ecoli_reference.sh b/scripts/exercises/05_download_ecoli_reference.sh
@@ -1,6 +1,6 @@
 #!/usr/bin/env bash
 
-REFERENCE_DIR=~/workdir/ref_genome/
+REFERENCE_DIR=~/project/ref_genome/
 
 mkdir $REFERENCE_DIR
 cd $REFERENCE_DIR

diff --git a/scripts/exercises/06_build_bowtie_index.sh b/scripts/exercises/06_build_bowtie_index.sh
@@ -1,5 +1,5 @@
 #!/usr/bin/env bash
 
-cd ~/workdir/ref_genome
+cd ~/project/ref_genome
 
 bowtie2-build ecoli-strK12-MG1655.fasta ecoli-strK12-MG1655.fasta
diff --git a/scripts/exercises/07_align_reads.sh b/scripts/exercises/07_align_reads.sh
@@ -1,8 +1,8 @@
 #!/usr/bin/env bash
 
-TRIMMED_DIR=~/workdir/results/trimmed
-REFERENCE_DIR=~/workdir/ref_genome/
-ALIGNED_DIR=~/workdir/results/alignments
+TRIMMED_DIR=~/project/results/trimmed
+REFERENCE_DIR=~/project/ref_genome/
+ALIGNED_DIR=~/project/results/alignments
 
 mkdir -p $ALIGNED_DIR
 

diff --git a/scripts/exercises/08_compress_sort.sh b/scripts/exercises/08_compress_sort.sh
@@ -1,6 +1,6 @@
 #!/usr/bin/env bash
 
-cd ~/workdir/results/alignments
+cd ~/project/results/alignments
 
 samtools view -bh SRR519926.sam > SRR519926.bam
 samtools sort SRR519926.bam > SRR519926.sorted.bam

diff --git a/scripts/exercises/09_extract_unmapped.sh b/scripts/exercises/09_extract_unmapped.sh
@@ -1,5 +1,5 @@
 #!/usr/bin/env bash
 
-cd ~/workdir/results/alignments
+cd ~/project/results/alignments
 
 samtools view -bh -f 0x4 SRR519926.sorted.bam > SRR519926.sorted.unmapped.bam
diff --git a/scripts/exercises/10_extract_region.sh b/scripts/exercises/10_extract_region.sh
@@ -1,6 +1,6 @@
 #!/usr/bin/env bash
 
-cd ~/workdir/results/alignments
+cd ~/project/results/alignments
 
 samtools view -bh \
 SRR519926.sorted.bam \

diff --git a/scripts/exercises/11_align_sort_filter.sh b/scripts/exercises/11_align_sort_filter.sh
@@ -1,8 +1,8 @@
 #!/usr/bin/env bash
 
-TRIMMED_DIR=~/workdir/results/trimmed
-REFERENCE_DIR=~/workdir/ref_genome
-ALIGNED_DIR=~/workdir/results/alignments
+TRIMMED_DIR=~/project/results/trimmed
+REFERENCE_DIR=~/project/ref_genome
+ALIGNED_DIR=~/project/results/alignments
 
 bowtie2 \
 -x $REFERENCE_DIR/ecoli-strK12-MG1655.fasta \

diff --git a/scripts/project1/01_download_reads.sh b/scripts/project1/01_download_reads.sh
@@ -1,8 +1,8 @@
 #!/usr/bin/env bash
 
-WORKDIR=/config/workdir/projects/project1
-mkdir -p "$WORKDIR"
-cd "$WORKDIR"
+PROJDIR=/config/project/projects/project1
+mkdir -p "$PROJDIR"
+cd "$PROJDIR"
 
 wget https://ngs-introduction-training.s3.eu-central-1.amazonaws.com/project1.tar.gz
 tar -xvf project1.tar.gz

diff --git a/scripts/project1/02_run_fastqc.sh b/scripts/project1/02_run_fastqc.sh
@@ -1,6 +1,6 @@
 #!/usr/bin/env bash
 
-WORKDIR=/config/workdir/projects/project1
-cd "$WORKDIR"/data/fastq
+PROJDIR=/config/project/projects/project1
+cd "$PROJDIR"/data/fastq
 
 fastqc *.fastq.gz
-Original file line number
+Diff line change
@@ Expand Up @@
     By adhering to these simple principles it will be relatively straightforward to re-do your analysis steps only based on the scripts, and will get you started to adhere to the [Ten Simple Rules for Reproducible Computational Research](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003285).
-    By the end of day 2 `~/workdir` should look (something) like this:
+    By the end of day 2 `~/project` should look (something) like this:
     ```
     .
@@ Expand Down @@