Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat sage #49

Merged
merged 17 commits into from
May 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 29 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,17 +35,17 @@ LOGAN supports either
LOGAN supports inputs of either
1) paired end fastq files

`--fastq_input`- A glob can be used to include all FASTQ files. Like `--fastq_input "*R{1,2}.fastq.gz"`. Globbing requires quotes
`--fastq_input`- A glob can be used to include all FASTQ files. Like `--fastq_input "*R{1,2}.fastq.gz"`. Globbing requires quotes.

2) Pre aligned BAM files with BAI indices

`--bam_input`- A glob can be used to include all FASTQ files. Like `--bam_input "*.bam"`. Globbing requires quotes
`--bam_input`- A glob can be used to include all FASTQ files. Like `--bam_input "*.bam"`. Globbing requires quotes.

3) A sheet that indicates the sample name and either FASTQs or BAM file locations

`--fastq_file_input`- A headerless tab delimited sheet that has the sample name, R1, and R2 file locations

`--bam_file_input` - A headerless tab delimited sheet that has the sample name, bam and bai file locations
`--bam_file_input` - A headerless tab delimited sheet that has the sample name, bam, and bam index (bai) file locations

### Operating Modes

Expand All @@ -64,30 +64,50 @@ No flags are required

Adding flags determines SNV (germline and/or somatic), SV, and/or CNV calling modes

`--vc`- Enables somatic SNV calling using mutect2, vardict, varscan, octopus, MUSE (TN only), and lofreq (TN only)
`--vc`- Enables somatic SNV calling using mutect2, vardict, varscan, octopus, sage, MUSE (TN only), and lofreq (TN only)


`--germline`- Enables germline using DV

`--sv`- Enables somatic SV calling using Manta and SVABA

`--vc`- Enables somatic CNV calling using FREEC, Sequenza, and Purple (hg38 only)
`--cnv`- Enables somatic CNV calling using FREEC, Sequenza, and Purple (hg38 only)



#### Optional Arguments
`--indelrealign` - Enables indel realignment when running alignment steps. May be helpful for certain callers (VarScan, VarDict)

`--callers`- Comma separated argument for callers, the default is to use all available. Example: `--callers mutect2,octopus,vardict,varscan`
`--callers`- Comma separated argument for callers, the default is to use all available.
Example: `--callers mutect2,octopus`

`--cnvcallers`- - Comma separated argument for cnvcallers. Adding flag allows only certain callers to run.
Example: `--cnvcallers purple`


## Running LOGAN
Example of Tumor_Normal calling mode
```bash
# copy the logan config files to your current directory
logan init
# preview the logan jobs that will run
logan run --mode local -profile ci_stub --genome hg38 --sample_sheet samplesheet.tsv --outdir out --fastq_input "*R{1,2}.fastq.gz" -preview --vc --sv --cnv
# run a stub/dryrun of the logan jobs
logan run --mode local -profile ci_stub --genome hg38 --sample_sheet samplesheet.tsv --outdir out --fastq_input "*R{1,2}.fastq.gz" -stub --vc --sv --cnv
# launch a logan run on slurm with the test dataset
logan run --mode slurm -profile biowulf,slurm --genome hg38 --sample_sheet samplesheet.tsv --outdir out --fastq_input "*R{1,2}.fastq.gz" --vc --sv --cnv
```

Example of Tumor only calling mode
```bash
# copy the logan config files to your current directory
logan init
# preview the logan jobs that will run
logan run --mode local -profile ci_stub --genome hg38 --outdir out --fastq_input "*R{1,2}.fastq.gz" -preview --vc --sv --cnv
logan run --mode local -profile ci_stub --genome hg38 --outdir out --fastq_input "*R{1,2}.fastq.gz" --callers octopus,mutect2 -preview --vc --sv --cnv
# run a stub/dryrun of the logan jobs
logan run --mode local -profile ci_stub --genome hg38 --outdir out --fastq_input "*R{1,2}.fastq.gz" -stub --vc --sv --cnv
logan run --mode local -profile ci_stub --genome hg38 --outdir out --fastq_input "*R{1,2}.fastq.gz" --callers octopus,mutect2 -stub --vc --sv --cnv
# launch a logan run on slurm with the test dataset
logan run --mode slurm -profile biowulf,slurm --genome hg38 --outdir out --fastq_input "*R{1,2}.fastq.gz" --vc --sv --cnv
logan run --mode slurm -profile biowulf,slurm --genome hg38 --outdir out --fastq_input "*R{1,2}.fastq.gz" --callers octopus,mutect2 --vc --sv --cnv
```

We currently support the hg38, hg19 (in progress), and mm10 genomes.
Expand Down
17 changes: 2 additions & 15 deletions bin/flowcell_lane.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,19 +40,6 @@ def usage(message = '', exitcode = 0):
sys.exit(exitcode)


def reader(fname):
"""Returns correct file object handler or reader for gzipped
or non-gzipped FastQ files based on the file extension. Assumes
gzipped files endwith the '.gz' extension.
"""
if fname.endswith('.gz'):
# Opens up file with gzip handler
return gzip.open
else:
# Opens up file normal, uncompressed handler
return open


def get_flowcell_lane(sequence_identifer):
"""Returns flowcell and lane information for different fastq formats.
FastQ files generated with older versions of Casava or downloaded from
Expand Down Expand Up @@ -130,10 +117,10 @@ def md5sum(filename, blocksize = 65536):
md5 = md5sum(filename)

# Get Flowcell and Lane information
handle = reader(filename)
handle = gzip.open if filename.endswith('.gz') else open
meta = {'flowcell': [], 'lane': [], 'flowcell_lane': []}
i = 0 # keeps track of line number
with handle(filename, 'r') as file:
with handle(filename, 'rt') as file:
print('sample_name\ttotal_read_pairs\tflowcell_ids\tlanes\tflowcell_lanes\tmd5_checksum')
for line in file:
line = line.strip()
Expand Down
Empty file modified bin/split_Bed_into_equal_regions.py
100644 → 100755
Empty file.
27 changes: 18 additions & 9 deletions conf/genomes.config
Original file line number Diff line number Diff line change
Expand Up @@ -28,15 +28,18 @@ params {
octopus_gforest= "--forest /data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/octopus/germline.v0.7.4.forest"
SEQUENZAGC = "/data/CCBR_Pipeliner/Pipelines/XAVIER/resources/hg38/SEQUENZA/hg38_gc50Base.txt.gz"
chromosomes = ['chr1','chr2','chr3','chr4','chr5','chr6','chr7','chr8','chr9','chr10','chr11','chr12','chr13','chr14','chr15','chr16','chr17','chr18','chr19','chr20','chr21','chr22','chrX','chrY','chrM']
//HMFTOOLS
GENOMEVER = "38"
HOTSPOTS = "-hotspots /data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/hmftools/v5_34/ref/38/variants/KnownHotspots.somatic.38.vcf.gz"
PANELBED = "-panel_bed /data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/hmftools/v5_34/ref/38/variants/ActionableCodingPanel.38.bed.gz"
HCBED = "-high_confidence_bed /data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/hmftools/v5_34/ref/38/variants/HG001_GRCh38_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf_nosomaticdel_noCENorHET7.bed.gz"
ENSEMBLCACHE = "-ensembl_data_dir /data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/hmftools/v5_34/ref/38/common/ensembl_data"
//PURPLE
GERMLINEHET = "/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/PURPLE/GermlineHetPon.38.vcf.gz"
GCPROFILE = "/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/PURPLE/GC_profile.1000bp.38.cnp"
DIPLODREG = '/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/PURPLE/DiploidRegions.38.bed.gz'
ENSEMBLCACHE = '/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/PURPLE/ensembl_data/'
DRIVERS = '/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/PURPLE/DriverGenePanel.38.tsv'
HOTSPOTS = '/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/PURPLE/KnownHotspots.somatic.38.vcf.gz'

}
GERMLINEHET = "/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/hmftools/v5_34/ref/38/copy_number/AmberGermlineSites.38.tsv.gz"
GCPROFILE = "/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/hmftools/v5_34/ref/38/copy_number/GC_profile.1000bp.38.cnp"
DIPLODREG = "/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/hmftools/v5_34/ref/38/copy_number/DiploidRegions.38.bed.gz"
DRIVERS = "/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/hmftools/v5_34/ref/38/common/DriverGenePanel.38.tsv"
}

'hg19' {
genome = "/data/CCBR_Pipeliner/db/PipeDB/lib/hg19.with_extra.fa"
Expand Down Expand Up @@ -65,8 +68,14 @@ params {
octopus_gforest= "" //"--forest /data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/octopus/germline.v0.7.4.forest"
SEQUENZAGC = "/data/CCBR_Pipeliner/Pipelines/XAVIER/resources/hg38/SEQUENZA/hg38_gc50Base.txt.gz"
chromosomes = ['chr1','chr2','chr3','chr4','chr5','chr6','chr7','chr8','chr9','chr10','chr11','chr12','chr13','chr14','chr15','chr16','chr17','chr18','chr19','chr20','chr21','chr22','chrX','chrY','chrM']
//HMFTOOLS
GENOMEVER = "37"
HOTSPOTS = "-hotspots /data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/hmftools/v5_34/ref/38/variants/KnownHotspots.38.vcf.gz"
PANELBED = "-panel_bed /data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/hmftools/v5_34/ref/38/variants/ActionableCodingPanel.38.bed.gz"
HCBED = "-high_confidence_bed /data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/hmftools/v5_34/ref/38/variants/HG001_GRCh38_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf_nosomaticdel_noCENorHET7.bed.gz"
ENSEMBLCACHE = "-ensembl_data_dir /data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/hmftools/v5_34/ref/38/common/ensembl_data"
//PURPLE
GERMLINEHET = "/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/PURPLE/GermlineHetPon.38.vcf.gz"
GERMLINEHET = "/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/hmftools/v5_34/ref/38/copy_number/AmberGermlineSites.38.tsv.gz"
GCPROFILE = "/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/PURPLE/GC_profile.1000bp.38.cnp"
DIPLODREG = '/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/PURPLE/DiploidRegions.38.bed.gz'
ENSEMBLCACHE = '/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/PURPLE/ensembl_data/'
Expand Down
59 changes: 0 additions & 59 deletions docker/lofreq/Dockerfile

This file was deleted.

11 changes: 0 additions & 11 deletions docker/lofreq/build.sh

This file was deleted.

82 changes: 58 additions & 24 deletions docker/logan_base/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -20,16 +20,18 @@ WORKDIR /opt2
RUN apt-get update \
&& apt-get -y upgrade \
&& DEBIAN_FRONTEND=noninteractive apt-get install -y \
bc
bc \
openjdk-17-jdk

# Common bioinformatics tools
# bwa/0.7.17-4 bowtie/1.2.3 bowtie2/2.3.5.1
# bedtools/2.27.1 bedops/2.4.37 samtools/1.10
# bcftools/1.10.2 vcftools/0.1.16
# Previous tools already installed trimmomatic/0.39 tabix/1.10.2
# Previous tools already installed tabix/1.10.2 trimmomatic/0.39
RUN DEBIAN_FRONTEND=noninteractive apt-get install -y \
tabix \
trimmomatic
tabix \
libhts-dev


# Install BWA-MEM2 v2.2.1
RUN wget https://github.com/bwa-mem2/bwa-mem2/releases/download/v2.2.1/bwa-mem2-2.2.1_x64-linux.tar.bz2 \
Expand All @@ -44,13 +46,17 @@ RUN wget https://github.com/biod/sambamba/releases/download/v0.8.1/sambamba-0.8.
&& mv /opt2/sambamba-0.8.1-linux-amd64-static /opt2/sambamba \
&& chmod a+rx /opt2/sambamba

# Install GATK4 (GATK/4.3.0.0)
# Requires Java8 or 1.8
RUN wget https://github.com/broadinstitute/gatk/releases/download/4.3.0.0/gatk-4.3.0.0.zip \
&& unzip /opt2/gatk-4.3.0.0.zip \
&& rm /opt2/gatk-4.3.0.0.zip \
&& /opt2/gatk-4.3.0.0/gatk --list
ENV PATH="/opt2/gatk-4.3.0.0:$PATH"
# Install GATK4 (GATK/4.4.0.0)
# Requires Java17
RUN wget https://github.com/broadinstitute/gatk/releases/download/4.4.0.0/gatk-4.4.0.0.zip \
&& unzip /opt2/gatk-4.4.0.0.zip \
&& rm /opt2/gatk-4.4.0.0.zip \
&& /opt2/gatk-4.4.0.0/gatk --list
ENV PATH="/opt2/gatk-4.4.0.0:$PATH"

# Use DISCVRSeq For CombineVariants Replacement
RUN wget https://github.com/BimberLab/DISCVRSeq/releases/download/1.3.62/DISCVRSeq-1.3.62.jar
ENV DISCVRSeq_JAR="/opt2/DISCVRSeq-1.3.62.jar"

# Install last release of GATK3 (GATK/3.8-1)
# Only being used for the CombineVariants
Expand Down Expand Up @@ -168,29 +174,57 @@ RUN wget https://github.com/AstraZeneca-NGS/VarDictJava/releases/download/v1.8.3
ENV PATH="/opt2/VarDict-1.8.3/bin:$PATH"

# Fastp From Opengene github
RUN wget http://opengene.org/fastp/fastp.0.23.2 \
RUN wget http://opengene.org/fastp/fastp.0.23.4 \
&& mkdir fastp \
&& mv fastp.0.23.2 fastp/fastp \
&& mv fastp.0.23.4 fastp/fastp \
&& chmod a+x fastp/fastp
ENV PATH="/opt2/fastp:$PATH"

# HMFtools for PURPLE/COBALT/AMBER
RUN wget https://github.com/hartwigmedical/hmftools/releases/download/amber-v3.9/amber-3.9.jar \
&& wget https://github.com/hartwigmedical/hmftools/releases/download/cobalt-v1.15.1/cobalt_v1.15.1.jar \
&& wget https://github.com/hartwigmedical/hmftools/releases/download/purple-v3.9/purple_v3.9.jar \
&& mkdir hmftools \
&& mv amber-3.9.jar hmftools/amber.jar \
&& mv cobalt_v1.15.1.jar hmftools/cobalt.jar \
&& mv purple_v3.9.jar hmftools/purple.jar \
&& chmod a+x hmftools/amber.jar
ENV PATH="/opt2/hmftools:$PATH"
# ASCAT
RUN Rscript -e 'devtools::install_github("VanLoo-lab/ascat/ASCAT")'

# SvABA
RUN wget -O svaba_1.2.0 https://github.com/walaj/svaba/releases/download/v1.2.0/svaba \
&& mkdir svaba \
&& mv svaba_1.2.0 svaba/svaba
&& mv svaba_1.2.0 svaba/svaba \
&& chmod a+x svaba/svaba

ENV PATH="/opt2/svaba:$PATH"

# LOFREQ
RUN git clone https://github.com/CSB5/lofreq \
&& cd /opt2/lofreq \
&& ./bootstrap \
&& ./configure --prefix=/opt2/lofreq/ \
&& make \
&& make install

ENV PATH="/opt2/lofreq/bin:$PATH"

# MUSE
RUN wget -O muse_2.0.4.tar.gz https://github.com/wwylab/MuSE/archive/refs/tags/v2.0.4.tar.gz \
&& tar -xzf muse_2.0.4.tar.gz \
&& cd MuSE-2.0.4 \
&& ./install_muse.sh \
&& mv MuSE /opt2/ \
&& chmod a+x /opt2/MuSE \
&& rm -R /opt2/MuSE-2.0.4 \
&& rm /opt2/muse_2.0.4.tar.gz

ENV PATH="/opt2/MuSE:$PATH"

# HMFtools for PURPLE/COBALT/AMBER
RUN wget https://github.com/hartwigmedical/hmftools/releases/download/amber-v4.0/amber-4.0.jar \
&& wget https://github.com/hartwigmedical/hmftools/releases/download/cobalt-v1.16/cobalt_v1.16.jar \
&& wget https://github.com/hartwigmedical/hmftools/releases/download/purple-v4.0/purple_v4.0.jar \
&& wget https://github.com/hartwigmedical/hmftools/releases/download/sage-v3.4/sage_v3.4.jar \
&& mkdir hmftools \
&& mv amber-4.0.jar hmftools/amber.jar \
&& mv cobalt_v1.16.jar hmftools/cobalt.jar \
&& mv purple_v4.0.jar hmftools/purple.jar \
&& mv sage.v3.4.jar hmftools/sage.jar \
&& chmod a+x hmftools/amber.jar
ENV PATH="/opt2/hmftools:$PATH"

# Add Dockerfile and argparse.bash script
# and export environment variables
Expand Down
12 changes: 7 additions & 5 deletions docker/logan_base/build.sh
Original file line number Diff line number Diff line change
@@ -1,14 +1,17 @@

# Build image
#docker buildx create --platform linux/amd64 --use
#docker buildx use upbeat_ganguly
#docker buildx inspect upbeat_ganguly
#docker buildx build --platform linux/amd64 -f Dockerfile -t dnousome/ccbr_logan_base:v0.3.0 -t dnousome/ccbr_logan_base:latest --push .

docker build --platform linux/amd64 --tag ccbr_logan_base:v0.3.0 -f Dockerfile .
docker tag ccbr_logan_base:v0.3.0 dnousome/ccbr_logan_base:v0.3.0
docker tag ccbr_logan_base:v0.3.0 dnousome/ccbr_logan_base
docker build --platform linux/amd64 --tag ccbr_logan_base:v0.3.5 -f Dockerfile .

docker tag ccbr_logan_base:v0.3.5 dnousome/ccbr_logan_base:v0.3.5
docker tag ccbr_logan_base:v0.3.5 dnousome/ccbr_logan_base

docker push dnousome/ccbr_logan_base:v0.3.0

docker push dnousome/ccbr_logan_base:v0.3.5
docker push dnousome/ccbr_logan_base:latest


Expand All @@ -21,4 +24,3 @@ docker push dnousome/ccbr_logan_base:latest
# Push image to DockerHub
#docker push nciccbr/ccbr_wgs_base:v0.1.0
#docker push nciccbr/ccbr_wgs_base:latest

2 changes: 1 addition & 1 deletion docker/logan_base/meta.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
dockerhub_namespace: dnousome
image_name: ccbr_logan_base
version: v0.3.4
version: v0.3.5
container: "$(dockerhub_namespace)/$(image_name):$(version)"
Loading
Loading