From cd80d3d640f9137a12a783c2f3d307d6d14b700c Mon Sep 17 00:00:00 2001 From: Darryl Nousome Date: Thu, 21 Nov 2024 10:11:21 -0500 Subject: [PATCH] docs: update docs --- README.md | 22 +++--- docs/index.md | 4 +- docs/user-guide/pipeline.md | 130 ++++++++++++++++++++++++++++++------ 3 files changed, 121 insertions(+), 35 deletions(-) diff --git a/README.md b/README.md index 0f67e8e..bf33c7d 100644 --- a/README.md +++ b/README.md @@ -19,20 +19,21 @@ Original pipelining and code forked from the CCBR Exome-seek Pipeline [Exome-see [singularity](https://singularity.lbl.gov/all-releases) must be installed on the target system. Snakemake orchestrates the execution of each step in the pipeline. To guarantee the highest level of reproducibility, each step relies on versioned images from [DockerHub](https://hub.docker.com/orgs/nciccbr/repositories). Nextflow uses singularity to pull these images onto the local filesystem prior to job execution, and as so, nextflow and singularity are the only two dependencies. ## Setup -LOGAN can be used with the Nextflow pipelining software +LOGAN can be used with the Nextflow pipelining software in Please clone this repository to your local filesystem using the following command on Biowulf: + ```bash # start an interactive node sinteractive --mem=2g --cpus-per-task=2 --gres=lscratch:200 + git clone https://github.com/CCBR/LOGAN module load nextflow ##Example run -nextflow run /data/LOGAN//main.nf +nextflow run LOGAN/main.nf -profile ci_stub -preview ``` ## Usage - ### Input Files LOGAN supports inputs of either 1) paired end fastq files @@ -99,11 +100,10 @@ Adding flags determines SNV (germline and/or somatic), SV, and/or CNV calling mo `--vc`- Enables somatic SNV calling using mutect2, vardict, varscan, octopus, strelka (TN only), MUSE (TN only), and lofreq (TN only) -`--germline`- Enables germline using Deepvariant +`--germline`- Enables germline calling using Deepvariant `--sv`- Enables somatic SV calling using Manta, GRIDSS, and SVABA - `--cnv`- Enables somatic CNV calling using FREEC, Sequenza, ASCAT, CNVKit, and Purple (hg19/hg38 only) @@ -124,21 +124,21 @@ Example: `--svcallers gridss` Example of Tumor_Normal calling mode ```bash # preview the logan jobs that will run -nextflow run /data/LOGAN/main.nf --mode local -profile ci_stub --genome hg38 --sample_sheet samplesheet.tsv --outdir out --fastq_input "*R{1,2}.fastq.gz" -preview --vc --sv --cnv +nextflow run LOGAN/main.nf --mode local -profile ci_stub --genome hg38 --sample_sheet samplesheet.tsv --outdir out --fastq_input "*R{1,2}.fastq.gz" -preview --vc --sv --cnv # run a stub/dryrun of the logan jobs -nextflow run /data/LOGAN/main.nf --mode local -profile ci_stub --genome hg38 --sample_sheet samplesheet.tsv --outdir out --fastq_input "*R{1,2}.fastq.gz" -stub --vc --sv --cnv +nextflow run LOGAN/main.nf --mode local -profile ci_stub --genome hg38 --sample_sheet samplesheet.tsv --outdir out --fastq_input "*R{1,2}.fastq.gz" -stub --vc --sv --cnv # launch a logan run on slurm with the test dataset -nextflow run /data/LOGAN/main.nf --mode slurm -profile biowulf,slurm --genome hg38 --sample_sheet samplesheet.tsv --outdir out --fastq_input "*R{1,2}.fastq.gz" --vc --sv --cnv +nextflow run LOGAN/main.nf --mode slurm -profile biowulf,slurm --genome hg38 --sample_sheet samplesheet.tsv --outdir out --fastq_input "*R{1,2}.fastq.gz" --vc --sv --cnv ``` Example of Tumor only calling mode ```bash # preview the logan jobs that will run -nextflow run /data/LOGAN/main.nf --mode local -profile ci_stub --genome hg38 --outdir out --fastq_input "*R{1,2}.fastq.gz" --callers octopus,mutect2 -preview --vc --sv --cnv +nextflow run LOGAN/main.nf --mode local -profile ci_stub --genome hg38 --outdir out --fastq_input "*R{1,2}.fastq.gz" --callers octopus,mutect2 -preview --vc --sv --cnv # run a stub/dryrun of the logan jobs -nextflow run /data/LOGAN/main.nf --mode local -profile ci_stub --genome hg38 --outdir out --fastq_input "*R{1,2}.fastq.gz" --callers octopus,mutect2 -stub --vc --sv --cnv +nextflow run LOGAN/main.nf --mode local -profile ci_stub --genome hg38 --outdir out --fastq_input "*R{1,2}.fastq.gz" --callers octopus,mutect2 -stub --vc --sv --cnv # launch a logan run on slurm with the test dataset -nextflow run /data/LOGAN/main.nf --mode slurm -profile biowulf,slurm --genome hg38 --outdir out --fastq_input "*R{1,2}.fastq.gz" --callers octopus,mutect2 --vc --sv --cnv +nextflow run LOGAN/main.nf --mode slurm -profile biowulf,slurm --genome hg38 --outdir out --fastq_input "*R{1,2}.fastq.gz" --callers octopus,mutect2 --vc --sv --cnv ``` diff --git a/docs/index.md b/docs/index.md index 21f6deb..7bb3826 100644 --- a/docs/index.md +++ b/docs/index.md @@ -4,9 +4,9 @@ Guide for running wgs-seek for WGS data! -* `wgs-seek` - Builds a submission script for slurm +* `nextflow run main.nf` - Builds a submission script for slurm ### References -Forked from [Exome-seek](https://github.com/mtandon09/CCBR_GATK4_Exome_Seq_Pipeline) +Forked from [Xavier](https://github.com/CCBR/XAVIER) with inspiration from genome-seek [Genome-seek](https://github.com/OpenOmics/genome-seek) diff --git a/docs/user-guide/pipeline.md b/docs/user-guide/pipeline.md index c5a31bf..033d99e 100644 --- a/docs/user-guide/pipeline.md +++ b/docs/user-guide/pipeline.md @@ -1,37 +1,123 @@ -# How to run WGS-Seek +# How to run LOGAN ## Guide -* `./wgs-seek` - Starts a next nextflow run -Supports runs from Fastq and either Tumor-Normal or Tumor-only Sequencing +### Preview Run +```bash +git clone https://github.com/CCBR/LOGAN +module load nextflow +# Starts a next nextflow preview run to see the processes that will run +nextflow run LOGAN/main.nf -profile ci_stub -preview` +``` -## Running Nextflow -Multiple options required for running -## Code -`./wgs-seek --fastq "Samples/Sample_R{1,2}.fastq.gz" --output 'B2' --sample_sheet sample.tsv --paired T --profile biowulf` +##Example run +## Usage +### Input Files +LOGAN supports inputs of either +1) paired end fastq files -### Arguments -Input selection can either be -`--fastq` -1) A wildcard expansion of Fastq files - "Samples/Sample_*_R{1,2}.fastq.gz" which finds all Samples in the directory with the head Sample_ -OR -`--filelist` -2a) A tab separated file with 3 columns Sample Name, Fastq1 Full path, Fastq2 Full Path if using fastq files or -2b) A tab separated file with 2 columns Sample Name, BAM file path +`--fastq_input`- A glob can be used to include all FASTQ files. Like `--fastq_input "*R{1,2}.fastq.gz"` quotes. -`--output` - Output Directory +2) Pre aligned BAM files with BAI indices -`--sample_sheet`- Tab separated file for Normal and Tumor delination with a header for "Normal" and "Tumor" +`--bam_input`- A glob can be used to include all FASTQ files. Like `--bam_input *.bam` -`--profile` Biowulf or Local Run +3) A sheet that indicates the sample name and either FASTQs or BAM file locations -`--resume` Resume previous nextflow run +`--fastq_file_input`- A headerless tab delimited sheet that has the sample name, R1, and R2 file locations -`--submit`- Submit job to Biowulf? +Example +```bash +c130863309_TUMOR /data/nousomedr/c130863309_TUMOR.R1_001.fastq.gz /data/nousomedr/c130863309_TUMOR.R2_001.fastq.gz +c130889189_PBMC /data/nousomedr/c130889189_PBMC.R1_001.fastq.gz /data/nousomedr/c130889189_PBMC.R2_001.fastq.gz +``` -`--paired`- Are Samples paired Tumor-Normal +`--bam_file_input` - A headerless Tab delimited sheet that has the sample name, bam, and bam index (bai) file locations + +Example +```bash +c130863309_TUMOR /data/nousomedr/c130863309_TUMOR.bam /data/nousomedr/c130863309_TUMOR.bam.bai +c130889189_PBMC /data/nousomedr/c130889189_PBMC.bam /data/nousomedr/c130889189_PBMC.bam.bai +``` + +### Genome +`--genome` - A flag to indicate which genome to run for alignment/variant calling/etc. Like `--genome hg38` to run the hg38 genome + +`--genome hg19` and `--genome mm10` are also supported + +#### hg38 has options for either +`--genome hg38` - Based off the GRCh38.d1.vd1.fa which is consistent with TCGA and other GDC processing pipelines + +`--genome hg38_sf` - Based off the Homo_sapiens_assembly38.fasta which is derived from the Broad Institute/NCI Sequencing Facility. +The biggest difference between the two is that GRCh38.d1.vd1.fa has fewer contigs (especially related to HLA regions), so reads should map to chr6 vs the HLA contig directly + + +### Operating Modes + +#### 1. Paired Tumor/Normal Mode + +Required for Paired Tumor/Normal Mode + +`--sample_sheet` In Paired mode a sample sheet must be provided with the basename of the Tumor and Normal samples. This sheet must be Tab separated with a header for Tumor and Normal. + +Example +```bash +Tumor Normal +c130863309_TUMOR c130863309_PBMC +c130889189_TUMOR c130889189_PBMC +``` + +#### 2. Tumor only mode + +No addtional flags for sample sheet are required as all samples will be used to call variants + +#### Calling Mode + +Adding flags determines SNV (germline and/or somatic), SV, and/or CNV calling modes + +`--vc`- Enables somatic SNV calling using mutect2, vardict, varscan, octopus, strelka (TN only), MUSE (TN only), and lofreq (TN only) + +`--germline`- Enables germline calling using Deepvariant + +`--sv`- Enables somatic SV calling using Manta, GRIDSS, and SVABA + +`--cnv`- Enables somatic CNV calling using FREEC, Sequenza, ASCAT, CNVKit, and Purple (hg19/hg38 only) + + + +#### Optional Arguments +`--indelrealign` - Enables indel realignment when running alignment steps. May be helpful for certain callers (VarScan, VarDict) + +`--callers`- Comma separated argument for selecting only specified callers, the default is to use all available. +Example: `--callers mutect2,octopus` + +`--cnvcallers`- - Comma separated argument for selecting only specified CNV callers. Adding flag allows only certain callers to run. +Example: `--cnvcallers purple` + +`--svcallers`- - Comma separated argument for selecting only specified SV vallers. Adding flag allows only certain callers to run. +Example: `--svcallers gridss` + +## Running LOGAN +Example of Tumor_Normal calling mode +```bash +# preview the logan jobs that will run +nextflow run LOGAN/main.nf --mode local -profile ci_stub --genome hg38 --sample_sheet samplesheet.tsv --outdir out --fastq_input "*R{1,2}.fastq.gz" -preview --vc --sv --cnv +# run a stub/dryrun of the logan jobs +nextflow run LOGAN/main.nf --mode local -profile ci_stub --genome hg38 --sample_sheet samplesheet.tsv --outdir out --fastq_input "*R{1,2}.fastq.gz" -stub --vc --sv --cnv +# launch a logan run on slurm with the test dataset +nextflow run LOGAN/main.nf --mode slurm -profile biowulf,slurm --genome hg38 --sample_sheet samplesheet.tsv --outdir out --fastq_input "*R{1,2}.fastq.gz" --vc --sv --cnv +``` + +Example of Tumor only calling mode +```bash +# preview the logan jobs that will run +nextflow run LOGAN/main.nf --mode local -profile ci_stub --genome hg38 --outdir out --fastq_input "*R{1,2}.fastq.gz" --callers octopus,mutect2 -preview --vc --sv --cnv +# run a stub/dryrun of the logan jobs +nextflow run LOGAN/main.nf --mode local -profile ci_stub --genome hg38 --outdir out --fastq_input "*R{1,2}.fastq.gz" --callers octopus,mutect2 -stub --vc --sv --cnv +# launch a logan run on slurm with the test dataset +nextflow run LOGAN/main.nf --mode slurm -profile biowulf,slurm --genome hg38 --outdir out --fastq_input "*R{1,2}.fastq.gz" --callers octopus,mutect2 --vc --sv --cnv +```