Skip to content

Commit

Permalink
updates somatic indel detection
Browse files Browse the repository at this point in the history
  • Loading branch information
hxrts committed Apr 21, 2015
1 parent 05efa39 commit 86553dd
Show file tree
Hide file tree
Showing 6 changed files with 54 additions and 26 deletions.
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
*.bam
*.bai
*.DS_Store
.DS_Store
scripts/.DS_Store
docs/
refs/
tools/
Expand Down
18 changes: 15 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,21 @@ When cloning this repo for the first time you must install the necessary tools.
2. Place a copy of annovar, samtools, etc.

### Executing the script
`python master.py -s sample_info.txt -p path_file.txt -d test`
production run: `python call-mutations-indels.py -s sample_info.txt -p path_file.txt -d test -b test`

This pipeline is currently installed on the hopp-cli server and must be executed from /home/sam/HOPP-Informatics/projects/sequencing_pipeline.
debug run: `python -i call-mutations-indels.py -s sample_info.txt -p path_file.txt -d test -b test`

This pipeline is currently installed on the hopp-cli server and must be executed from /home/sam/HOPP-Informatics/projects/sequencingPipeline.

### Build reference index (if adding new assembly)
Generate the BWA index
`bwa index -a bwtsw reference.fa`

Generate the fasta file index
`samtools faidx reference.fa`

Generate the sequence dictionary
`java -jar picard.jar /home/sam/tools/picard-tools-1.130/CreateSequenceDictionary REFERENCE=reference.fa OUTPUT=reference.dict`

### Warning
This pipeline is still in development and produces known errors. This page will be updated upon reaching a functional alpha.
This pipeline is still in development and produces known errors use at your own risk.
6 changes: 2 additions & 4 deletions call-mutations-indels.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# --------------------------------------------------------------------------------------------------------#
# This master script that parses the sample information provided by the user and calls mutations + indels
# This primary script that parses the sample information provided by the user and calls mutations + indels
# If there is a need to realign and recalibrate, it runs GATK realignment/recalibration scripts
# --------------------------------------------------------------------------------------------------------#

Expand All @@ -17,7 +17,6 @@
import csv
import os
import argparse
import recurse

#------------------------------------------------------------------#
# Parser Info
Expand Down Expand Up @@ -116,7 +115,7 @@

GATK_PATH = exdir + '/scripts/pipelineGATK.sh' # GATK pipeline path
GATK_INTERVAL_PATH = exdir + '/scripts/pipelineIntervalGATK.sh' # GATK interval script path
SAM_INDEX_PATH = exdir + '/home/sam/HOPP-Informatics/projects/sequencing_pipeline/tools/samtools-0.1.19/samtools index' # samtools path
SAM_INDEX_PATH = exdir + '/home/sam/tools/samtools-1.1/samtools index' # samtools path
SOMATIC_SNIPER_PATH = exdir + '/scripts/call-somatic-sniper.sh' # somatic sniper path
SOMATIC_INDEL_PATH = exdir + '/scripts/call-indels.sh' # somatic indel_detector path
SOMATIC_INDEL_INTERVAL_PATH = exdir + '/scripts/call-indels-with-intervals.sh' # somatic indel path with intervals
Expand Down Expand Up @@ -345,7 +344,6 @@
print "Calling somatic mutations using Somatic Sniper"
sys.stdout.flush()
os_call = SOMATIC_SNIPER_PATH+" "+tumor_recalibrated_directory+" "+normal_recalibrated_directory+" "+somatic_sniper_directory+" "+sample_name[i]
print os_call+"!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"
os.system(os_call)
if (somatic_indel_flag == 1):
if (bedfile_flag == 1):
Expand Down
16 changes: 11 additions & 5 deletions path_file.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
# GLOBAL
#--------#

REF=/home/sam/HOPP-Informatics/projects/sequencingPipeline/refs/Homo_sapiens_assembly19.fasta
#REF=/home/sam/HOPP-Informatics/projects/sequencingPipeline/refs/Homo_sapiens_assembly19.fasta
REF=/home/sam/HOPP-Informatics/projects/sequencingPipeline/refs/GRCh37-lite.fa

#---------#
# ANNOVAR
Expand All @@ -18,9 +19,14 @@ ANNOVAR_DB=/hopp-storage/HOPP-TOOLS/ANNOTATIONS/annovar-may-2013/annovar/humandb
# GATK
#------#

GATK=/hopp-storage/HOPP-TOOLS/PIPELINES/GATKBundle/GenomeAnalysisTK-2.4-9-g532efad/GenomeAnalysisTK.jar
GAT1=/hopp-storage/HOPP-TOOLS/PIPELINES/GATKBundle/Somatic-Indel-Detector/GATK/dist/GenomeAnalysisTK.jar
GAT0=/hopp-storage/HOPP-TOOLS/PIPELINES/GATKBundle/Sting/dist/GenomeAnalysisTK.jar # Use old GATK for identifying target intervals
GATK=/home/sam/tools/GenomeAnalysisTK-3.3-0/GenomeAnalysisTK.jar
#GATK_IndelGeno=/home/sam/tools/GenomeAnalysisTK-3.3-0/IndelGenotyper.36.3336-GenomeAnalysisTK.jar

#GAT1=/home/sam/tools/GenomeAnalysisTK-3.3-0/GenomeAnalysisTK.jar
#GAT0=/home/sam/tools/GenomeAnalysisTK-3.3-0/GenomeAnalysisTK.jar
#GATK=/hopp-storage/HOPP-TOOLS/PIPELINES/GATKBundle/GenomeAnalysisTK-2.4-9-g532efad/GenomeAnalysisTK.jar
#GAT1=/hopp-storage/HOPP-TOOLS/PIPELINES/GATKBundle/Somatic-Indel-Detector/GATK/dist/GenomeAnalysisTK.jar
#GAT0=/hopp-storage/HOPP-TOOLS/PIPELINES/GATKBundle/Sting/dist/GenomeAnalysisTK.jar # Use old GATK for identifying target intervals

#--------#
# MUTECT
Expand Down Expand Up @@ -61,6 +67,6 @@ SNPEFF=/hopp-storage/HOPP-TOOLS/ANNOTATIONS/snpEff/SnpSift.jar
# 1000g_ANNO=/hopp-storage/HOPP-TOOLS/PIPELINES/MutPipelines/scripts/annotate-100g-calls.sh <------ missing
# ESP_ANNO=/hopp-storage/HOPP-TOOLS/PIPELINES/MutPipelines/scripts/annotate-ESP-calls.sh <------ missing
# dbSNP_ANNO=/hopp-storage/HOPP-TOOLS/PIPELINES/MutPipelines/scripts/annotate-dbSNP-calls.sh <------ missing
# COSMIC_ANNO=/HOPP-Informatics/projects/sequencing_pipeline/scripts/annotate-cosmic-calls.sh
# COSMIC_ANNO=/HOPP-Informatics/projects/sequencingPipeline/scripts/annotate-cosmic-calls.sh


6 changes: 3 additions & 3 deletions sample_info.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
BAMPath /home/sam/HOPP-Informatics/projects/sequencing_pipeline/bams
LOCPath /home/sam/HOPP-Informatics/projects/sequencing_pipeline/loc
s_GA_1 s_GA_NL1__1.bam s_GA_T1__1.bam
BAMPath /home/sam/HOPP-Informatics/projects/sequencingPipeline/bams
LOCPath /home/sam/HOPP-Informatics/projects/sequencingPipeline/loc
AK-3451 TCGA-AK-3451-10A-01D-1251-10_Illumina.bam TCGA-AK-3451-01A-02D-1251-10_Illumina.bam
31 changes: 21 additions & 10 deletions scripts/call-indels-with-intervals.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,24 @@
echo "*** calling indels with intervals ***"
source path_file.sh # path to the GATK (somatic indel detector) and reference genome

java -Xmx8g -jar "$GAT1" -R "$REF" -T SomaticIndelDetector -mnr 50000 \
-minConsensusFraction 0.7 \
-minCoverage 6 \
-minNormalCoverage 4 \
-o $3/SI-$4.vcf \
-verbose $3/SI-$4.txt \
-I:normal $1/out.recal.quality.bam \
-I:tumor $2/out.recal.quality.bam \
-L $5 \
--validation_strictness SILENT -U
java -Xmx8g -jar "$GATK" \
-R "$REF" \
-T UnifiedGenotyper \
-I:normal $1/out.recal.quality.bam \
-I:tumor $2/out.recal.quality.bam \
-o $3/SI-$4.vcf \
-stand_call_conf 50.0 \
-stand_emit_conf 10.0 \
-dcov 200


# java -Xmx8g -jar "$GATK_IndelGeno" -R "$REF" -T SomaticIndelDetector -mnr 50000 \
# -minConsensusFraction 0.7 \
# -minCoverage 6 \
# -minNormalCoverage 4 \
# -o $3/SI-$4.vcf \
# -verbose $3/SI-$4.txt \
# -I:normal $1/out.recal.quality.bam \
# -I:tumor $2/out.recal.quality.bam \
# -L $5 \
# --validation_strictness SILENT -U

0 comments on commit 86553dd

Please sign in to comment.