Skip to content

Commit

Permalink
update docs for hs1, quality and controls
Browse files Browse the repository at this point in the history
  • Loading branch information
slsevilla committed Feb 10, 2023
1 parent aca4a38 commit ddbbf6d
Show file tree
Hide file tree
Showing 4 changed files with 73 additions and 23 deletions.
5 changes: 3 additions & 2 deletions docs/user-guide/contributions.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
# Contributions
The following members contributed to the development of the iCLIP pipeline:
The following members contributed to the development of the CARLISLE pipeline:

- [Vishal Koparde](https://github.com/kopardev)
- [Samantha Sevilla](https://github.com/slsevilla)
- Sohyoung Kim
- Vassiliki Saloura
- [Hsien-chao Chou](https://github.com/hsienchao)

VK, SS, SK contributed to the generating the source code and all members contributed to the main concepts and analysis.
VK, SS, SK, HC contributed to the generating the source code and all members contributed to the main concepts and analysis.
24 changes: 23 additions & 1 deletion docs/user-guide/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ The CARLISLE github repository is stored locally, and will be used for project d
## 1. Getting Started

## 1.1 Introduction
The CARLISLE Pipelie beings with raw FASTQ files and performs trimming followed by alignment using [BOWTIE2](https://bowtie-bio.sourceforge.net/bowtie2/index.shtml). Data is then normalized through either the use of an user-species species (IE E.Coli) spike-in control or through the determined library size. Peaks are then called using [MACS2](https://hbctraining.github.io/Intro-to-ChIPseq/lessons/05_peak_calling_macs.html), [SEACR](https://github.com/FredHutch/SEACR), and [GoPEAKS](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-022-02707-w) with various options selected by the user. Peaks are then annotated, and summarized into reports. If designated, differential analysis is performed using [DESEQ2](https://bioconductor.org/packages/release/bioc/html/DESeq2.html). QC reports are also generated with each project.
The CARLISLE Pipelie beings with raw FASTQ files and performs trimming followed by alignment using [BOWTIE2](https://bowtie-bio.sourceforge.net/bowtie2/index.shtml). Data is then normalized through either the use of an user-species species (IE E.Coli) spike-in control or through the determined library size. Peaks are then called using [MACS2](https://hbctraining.github.io/Intro-to-ChIPseq/lessons/05_peak_calling_macs.html), [SEACR](https://github.com/FredHutch/SEACR), and [GoPEAKS](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-022-02707-w) with various options selected by the user. Peaks are then annotated, and summarized into reports. If designated, differential analysis is performed using [DESEQ2](https://bioconductor.org/packages/release/bioc/html/DESeq2.html). QC reports are also generated with each project using [FASTQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and [MULTIQC](https://multiqc.info/). Annotations are added using [HOMER](http://homer.ucsd.edu/homer/ngs/annotation.html) and [ROSE](https://github.com/stjude/ROSE).

The following are sub-commands used within CARLISLE:

Expand All @@ -16,6 +16,7 @@ The following are sub-commands used within CARLISLE:
- unlock: unlock directory
- DAG: create DAG report
- report: create SNAKEMAKE report
- testrun: copies test manifests and files to WORKDIR

## 1.2 Setup Dependencies
CARLISLE has several dependencies listed below. These dependencies can be installed by a sysadmin. All dependencies will be automatically loaded if running from Biowulf.
Expand All @@ -36,6 +37,27 @@ CARLISLE has several dependencies listed below. These dependencies can be instal
- seacr: "seacr/1.4-beta.2"
- ucsc: "ucsc/407"

bedtools: "bedtools/2.30.0"
bedops: "bedops/2.4.40"
bowtie2: "bowtie/2-2.4.2"
cutadapt: "cutadapt/1.18"
fastqc: "fastqc/0.11.9"
fastq_screen: "fastq_screen/0.15.2"
fastq_val: "fastq_val/0.1.1"
fastxtoolkit: "fastxtoolkit/0.0.14"
gopeaks: "github clone https://github.com/maxsonBraunLab/gopeaks"
homer: "homer/4.11.1"
macs2: "macs/2.2.7.1"
multiqc: "multiqc/1.9"
perl: "perl/5.34.0"
picard: "picard/2.26.9"
python37: "python/3.7"
R: "R/4.2.2"
rose: "ROSE/1.3.1"
samtools: "samtools/1.15"
seacr: "seacr/1.4-beta.2"
ucsc: "ucsc/407"

## 1.3 Login to the cluster
CARLISLE has been exclusively tested on Biowulf HPC. Login to the cluster's head node and move into the pipeline location.
```
Expand Down
23 changes: 20 additions & 3 deletions docs/user-guide/preparing-files.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,13 +80,30 @@ peaktype: "norm.stringent.bed, norm.relaxed.bed"
peaktype: "narrowGo_peaks.bed, broadGo_peaks.bed"
```
A complete list of the available peak calling parameters and the recommended list of parameters is provided below:
```
# Complete list
peaktype: "narrowPeak, broadPeak, norm.stringent.bed, norm.relaxed.bed, non.stringent.bed, non.relaxed.bed, narrowGo_peaks.bed, broadGo_peaks.bed"

| Peak Caller | Narrow | Broad | Normalized, Stringent | Normalized, Relaxed | Non-Normalized, Stringent | Non-Normalized, Relaxed |
| --- | --- | --- | --- | --- | --- | --- |
| Macs2 | narrowPeak | broadPeak | NA | NA | NA | NA |
| SEACR | NA | NA | norm.stringent.bed | norm.relaxed.bed | non.stringent.bed | non.relaxed.bed |
| GoPeaks | narrowGo_peaks.bed | broadGo_peaks.bed | NA | NA | NA | NA |

```
# Recommended list
# peaktype: "narrowPeak, broadPeak, norm.stringent.bed, norm.relaxed.bed, narrowGo_peaks.bed, broadGo_peaks.bed"
```
##### 2.1.3.1.3.1 Macs2 additional option
MACS2 can be run with or without the control. adding a control will increase peak specificity
Selecting "Y" for the `macs2_control` will run the paired control sample provided in the sample manifest

##### 2.1.3.1.4 Quality Tresholds
Thresholds for quality can be controled through the `quality_tresholds` parameter. This must be a list of comma separated values. minimum of numeric value required.
- default MACS2 qvalue is 0.05 https://manpages.ubuntu.com/manpages/xenial/man1/macs2_callpeak.1.html
- default GOPEAKS pvalue is 0.05 https://github.com/maxsonBraunLab/gopeaks/blob/main/README.md
- default SEACR FDR threshold 1 https://github.com/FredHutch/SEACR/blob/master/README.md
```
#default values
quality_thresholds: "0.1, 0.05, 0.01"
```

#### 2.1.3.2 References
Additional reference files may be added to the pipeline, if other species were to be used.
Expand Down
44 changes: 27 additions & 17 deletions docs/user-guide/test-info.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,29 +22,39 @@ bash ./path/to/dir/carlisle --runmode=testrun --workdir=/path/to/output/dir

- An expected output for the `testrun` is as follows:
```
Job stats:
job count min threads max threads
----------------------------- ------- ------------- -------------
DESeq 6 1 1
DESeq2 6 1 1
align 6 56 56
alignstats 6 2 2
DESeq 60 1 1
DESeq2 60 1 1
align 9 56 56
alignstats 9 2 2
all 1 1 1
bam2bg 12 2 2
bed2bb 5 2 2
contrast_init 1 1 1
bam2bg 18 4 4
bed2bb_gopeaks 48 2 2
bed2bb_seacr 48 2 2
contrast_init 3 1 1
create_reference 1 32 32
create_replicate_sample_table 1 1 1
diffbb 6 1 1
filter 12 2 2
diffbb 60 1 1
filter 18 2 2
findMotif 240 6 6
gather_alignstats 1 1 1
macs2 12 2 2
make_counts_matrix 6 1 1
make_inputs 6 1 1
peak2bb 12 2 2
seacr 5 2 2
trim 6 56 56
venn 6 1 1
total 117 1 56
gopeaks 48 8 8
macs2 54 2 2
make_counts_matrix 60 1 1
make_inputs 60 1 1
multiqc 1 1 1
peak2bb_macs2 54 2 2
peakAnnotation_macs2 54 1 1
peakAnnotation_s_and_g 192 1 1
qc_fastq_screen_validator 9 4 4
qc_fastqc 9 1 1
rose 240 2 2
seacr 48 2 2
trim 9 56 56
venn 60 1 1
total 1475 1 56
```

## 5.3 Review outputs
Expand Down

0 comments on commit ddbbf6d

Please sign in to comment.