Skip to content

Commit

Permalink
feat: set tempdir depending on HPC
Browse files Browse the repository at this point in the history
resolves #106
  • Loading branch information
kelly-sovacool committed Jun 3, 2024
1 parent 5cef3a7 commit 9e1608c
Show file tree
Hide file tree
Showing 10 changed files with 167 additions and 292 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
- Major updates to convert CHARLIE from a biowulf-specific to a platform-agnostic pipeline (#102, @kelly-sovacool):
- All rules now use containers instead of envmodules.
- Default config and cluster config files are provided for use on biowulf and FRCE.
- New entry `TEMPDIR` in the config file sets the temporary directory location for rules that require transient storage.

# CHARLIE 0.10.1

Expand Down
16 changes: 9 additions & 7 deletions config/biowulf/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,30 +2,32 @@
#
# The working dir... output will be in the results subfolder of the workdir
workdir: "WORKDIR"
#

# temporary directory for intermediate files that are not saved
tempdir: '/lscratch/$SLURM_JOB_ID'

# tab delimited samples file ... should have the following 3 columns
# sampleName path_to_R1_fastq path_to_R2_fastq
#
samples: "WORKDIR/samples.tsv"
#

# Should the CLEAR pipeline be run? True or False WITHOUT quotes
run_clear: True
#

# Should the DCC pipeline be run? True or False WITHOUT quote
run_dcc: True
#

# Should the MapSplice pipeline be run? True or False WITHOUT quotes
run_mapsplice: False
mapsplice_min_map_len: 50
mapsplice_filtering: 2 # 1=less stringent 2=default
#

# Should the circRNA_finder be run? True or False WITHOUT quotes
run_circRNAFinder: True
# Should the NCLscan pipeline be run? True or False WITHOUT quotes
# This can only be run for PE data
run_nclscan: False
nclscan_config: "WORKDIR/nclscan.config"
#

# Should we also run find_circ? True or False WITHOUT quotes
run_findcirc: False
# findcirc_params: "--noncanonical --allhits" # this gives way too many circRNAs
Expand Down
16 changes: 9 additions & 7 deletions config/fnlcr/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,30 +2,32 @@
#
# The working dir... output will be in the results subfolder of the workdir
workdir: "WORKDIR"
#

# temporary directory for intermediate files that are not saved
tempdir: '/scratch/local'

# tab delimited samples file ... should have the following 3 columns
# sampleName path_to_R1_fastq path_to_R2_fastq
#
samples: "WORKDIR/samples.tsv"
#

# Should the CLEAR pipeline be run? True or False WITHOUT quotes
run_clear: True
#

# Should the DCC pipeline be run? True or False WITHOUT quote
run_dcc: True
#

# Should the MapSplice pipeline be run? True or False WITHOUT quotes
run_mapsplice: False
mapsplice_min_map_len: 50
mapsplice_filtering: 2 # 1=less stringent 2=default
#

# Should the circRNA_finder be run? True or False WITHOUT quotes
run_circRNAFinder: True
# Should the NCLscan pipeline be run? True or False WITHOUT quotes
# This can only be run for PE data
run_nclscan: False
nclscan_config: "WORKDIR/nclscan.config"
#

# Should we also run find_circ? True or False WITHOUT quotes
run_findcirc: False
# findcirc_params: "--noncanonical --allhits" # this gives way too many circRNAs
Expand Down
1 change: 1 addition & 0 deletions docs/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,7 @@ The above command creates `<path to output dir>` folder and creates 2 subfolders
This file is used to fine tune the execution of the pipeline by setting:
* sample sheet location ... aka `samples.tsv`
* the temporary directory -- make sure this is correct for your computing environment.
* which circRNA finding tools to use by editing these:
* run_clear: True
* run_dcc: True
Expand Down
95 changes: 36 additions & 59 deletions workflow/rules/align.smk
Original file line number Diff line number Diff line change
Expand Up @@ -53,19 +53,12 @@ rule star1p:
outdir=join(WORKDIR, "results", "{sample}", "STAR1p"),
starindexdir=STAR_INDEX_DIR,
alignTranscriptsPerReadNmax=config["alignTranscriptsPerReadNmax"],
randomstr=str(uuid.uuid4()),
tmpdir=f"{TEMPDIR}/{str(uuid.uuid4())}",
container: config['containers']["star"]
threads: getthreads("star1p")
shell:
"""
set -exo pipefail
if [ -d /lscratch/${{SLURM_JOB_ID}} ];then
TMPDIR="/lscratch/${{SLURM_JOB_ID}}/{params.randomstr}"
else
TMPDIR="/dev/shm/{params.randomstr}"
fi
if [ ! -d {params.outdir} ];then mkdir {params.outdir};fi
if [ "{params.peorse}" == "PE" ];then
# paired-end
overhang=$(zcat {input.R1} {input.R2} | awk -v maxlen=100 'NR%4==2 {{if (length($1) > maxlen+0) maxlen=length($1)}}; END {{print maxlen-1}}')
Expand Down Expand Up @@ -99,7 +92,7 @@ if [ "{params.peorse}" == "PE" ];then
--alignEndsProtrude 10 ConcordantPair \\
--outFilterIntronMotifs None \\
--sjdbGTFfile {input.gtf} \\
--outTmpDir ${{TMPDIR}} \\
--outTmpDir {params.tmpdir} \\
--sjdbOverhang $overhang
rm -rf {params.sample}_p1._STARgenome
Expand Down Expand Up @@ -135,7 +128,7 @@ if [ "{params.peorse}" == "PE" ];then
--alignEndsProtrude 10 ConcordantPair \\
--outFilterIntronMotifs None \\
--sjdbGTFfile {input.gtf} \\
--outTmpDir ${{TMPDIR}} \\
--outTmpDir {params.tmpdir} \\
--sjdbOverhang $overhang
rm -rf {params.sample}_mate1._STARgenome
Expand Down Expand Up @@ -171,7 +164,7 @@ if [ "{params.peorse}" == "PE" ];then
--alignEndsProtrude 10 ConcordantPair \\
--outFilterIntronMotifs None \\
--sjdbGTFfile {input.gtf} \\
--outTmpDir ${{TMPDIR}} \\
--outTmpDir {params.tmpdir} \\
--sjdbOverhang $overhang
rm -rf {params.sample}_mate2._STARgenome
Expand Down Expand Up @@ -211,7 +204,7 @@ else
--alignEndsProtrude 10 ConcordantPair \\
--outFilterIntronMotifs None \\
--sjdbGTFfile {input.gtf} \\
--outTmpDir ${{TMPDIR}} \\
--outTmpDir {params.tmpdir} \\
--sjdbOverhang $overhang
mkdir -p $(dirname {output.mate1_chimeric_junctions})
touch {output.mate1_chimeric_junctions}
Expand Down Expand Up @@ -304,19 +297,13 @@ rule star2p:
outdir=join(WORKDIR, "results", "{sample}", "STAR2p"),
starindexdir=STAR_INDEX_DIR,
alignTranscriptsPerReadNmax=config["alignTranscriptsPerReadNmax"],
randomstr=str(uuid.uuid4()),
tmpdir=f"{TEMPDIR}/{str(uuid.uuid4())}",
container: config['containers']['star_ucsc_cufflinks']
threads: getthreads("star2p")
shell:
"""
set -exo pipefail
if [ -d /lscratch/${{SLURM_JOB_ID}} ];then
TMPDIR="/lscratch/${{SLURM_JOB_ID}}/{params.randomstr}"
else
TMPDIR="/dev/shm/{params.randomstr}"
fi
if [ ! -d {params.outdir} ];then mkdir {params.outdir};fi
limitSjdbInsertNsj=$(wc -l {input.pass1sjtab}|awk '{{print $1+1}}')
if [ "$limitSjdbInsertNsj" -lt "400000" ];then limitSjdbInsertNsj="400000";fi
Expand Down Expand Up @@ -359,7 +346,7 @@ if [ "{params.peorse}" == "PE" ];then
--outFilterIntronMotifs None \\
--sjdbGTFfile {input.gtf} \\
--quantMode GeneCounts \\
--outTmpDir ${{TMPDIR}} \\
--outTmpDir {params.tmpdir} \\
--sjdbOverhang $overhang \\
--outBAMcompression 0 \\
--outSAMattributes All
Expand Down Expand Up @@ -404,43 +391,43 @@ else
--outFilterIntronMotifs None \\
--sjdbGTFfile {input.gtf} \\
--quantMode GeneCounts \\
--outTmpDir ${{TMPDIR}} \\
--outTmpDir {params.tmpdir} \\
--sjdbOverhang $overhang \\
--outBAMcompression 0 \\
--outSAMattributes All
rm -rf ${{output_prefix}}_STARgenome
fi
sleep 120
if [ ! -d $TMPDIR ];then mkdir -p $TMPDIR;fi
samtools view -H {output.unsortedbam} > ${{TMPDIR}}/{params.sample}_p2.non_chimeric.sam
cp ${{TMPDIR}}/{params.sample}_p2.non_chimeric.sam ${{TMPDIR}}/{params.sample}_p2.chimeric.sam
mkdir -p {params.tmpdir}
samtools view -H {output.unsortedbam} > {params.tmpdir}/{params.sample}_p2.non_chimeric.sam
cp {params.tmpdir}/{params.sample}_p2.non_chimeric.sam {params.tmpdir}/{params.sample}_p2.chimeric.sam
# ref https://github.com/alexdobin/STAR/issues/678
samtools view -@ {threads} {output.unsortedbam} | grep "ch:A:1" >> ${{TMPDIR}}/{params.sample}_p2.chimeric.sam
samtools view -@ {threads} {output.unsortedbam} | grep -v "ch:A:1" >> ${{TMPDIR}}/{params.sample}_p2.non_chimeric.sam
samtools view -@ {threads} {output.unsortedbam} | grep "ch:A:1" >> {params.tmpdir}/{params.sample}_p2.chimeric.sam
samtools view -@ {threads} {output.unsortedbam} | grep -v "ch:A:1" >> {params.tmpdir}/{params.sample}_p2.non_chimeric.sam
ls -alrth
for i in 1 2 3;do
if [ ! -d ${{TMPDIR}}/{params.randomstr}_${{i}} ];then mkdir -p ${{TMPDIR}}/{params.randomstr}_${{i}};fi
mkdir -p {params.tmpdir}_${{i}}
done
samtools view -@ {threads} -b -S ${{TMPDIR}}/{params.sample}_p2.chimeric.sam | \\
samtools view -@ {threads} -b -S {params.tmpdir}/{params.sample}_p2.chimeric.sam | \\
samtools sort \\
-l 9 \\
-T ${{TMPDIR}}/{params.randomstr}_1 \\
-T {params.tmpdir}_1 \\
--write-index \\
-@ {threads} \\
--output-fmt BAM \\
-o {output.chimeric_bam} -
samtools view -@ {threads} -b -S ${{TMPDIR}}/{params.sample}_p2.non_chimeric.sam | \\
samtools view -@ {threads} -b -S {params.tmpdir}/{params.sample}_p2.non_chimeric.sam | \\
samtools sort \\
-l 9 \\
-T ${{TMPDIR}}/{params.randomstr}_2 \\
-T {params.tmpdir}_2 \\
--write-index \\
-@ {threads} \\
--output-fmt BAM \\
-o {output.non_chimeric_bam} -
samtools sort \\
-l 9 \\
-T ${{TMPDIR}}/{params.randomstr}_3 \\
-T {params.tmpdir}_3 \\
--write-index \\
-@ {threads} \\
--output-fmt BAM \\
Expand Down Expand Up @@ -479,17 +466,12 @@ rule star_circrnafinder:
flanksize=FLANKSIZE,
starindexdir=STAR_INDEX_DIR,
alignTranscriptsPerReadNmax=config["alignTranscriptsPerReadNmax"],
randomstr=str(uuid.uuid4()),
tmpdir=f"{TEMPDIR}/{str(uuid.uuid4())}",
container: config['containers']['star_ucsc_cufflinks']
threads: getthreads("star_circrnafinder")
shell:
"""
set -exo pipefail
if [ -d /lscratch/${{SLURM_JOB_ID}} ];then
TMPDIR="/lscratch/${{SLURM_JOB_ID}}/{params.randomstr}"
else
TMPDIR="/dev/shm/{params.randomstr}"
fi
outdir=$(dirname {output.chimericsam})
if [ ! -d $outdir ];then mkdir -p $outdir;fi
Expand All @@ -514,7 +496,7 @@ if [ "{params.peorse}" == "PE" ];then
--outFilterMultimapNmax 2 \\
--outFileNamePrefix {params.sample}. \\
--outBAMcompression 0 \\
--outTmpDir $TMPDIR \\
--outTmpDir {params.tmpdir} \\
--sjdbGTFfile {input.gtf}
else
Expand All @@ -536,7 +518,7 @@ else
--outFilterMultimapNmax 2 \\
--outFileNamePrefix {params.sample}. \\
--outBAMcompression 0 \\
--outTmpDir $TMPDIR \\
--outTmpDir {params.tmpdir} \\
--sjdbGTFfile {input.gtf}
fi
Expand Down Expand Up @@ -571,18 +553,13 @@ rule find_circ_align:
sample="{sample}",
reffa=REF_FA,
peorse=get_peorse,
randomstr=str(uuid.uuid4()),
tmpdir=f"{TEMPDIR}/{str(uuid.uuid4())}",
container: config['containers']['star_ucsc_cufflinks']
threads: getthreads("find_circ_align")
shell:
"""
set -exo pipefail
if [ -d /lscratch/${{SLURM_JOB_ID}} ];then
TMPDIR="/lscratch/${{SLURM_JOB_ID}}/{params.randomstr}"
else
TMPDIR="/dev/shm/{params.randomstr}"
fi
if [ ! -d $TMPDIR ];then mkdir -p $TMPDIR;fi
mkdir -p {params.tmpdir}
refdir=$(dirname {input.bt2})
outdir=$(dirname {output.anchorsfq})
Expand All @@ -598,7 +575,7 @@ bowtie2 \\
-q \\
-1 {input.R1} \\
-2 {input.R2} \\
> ${{TMPDIR}}/{params.sample}.sam
> {params.tmpdir}/{params.sample}.sam
else
bowtie2 \\
-p {threads} \\
Expand All @@ -608,35 +585,35 @@ bowtie2 \\
-x ${{refdir}}/ref \\
-q \\
-U {input.R1} \\
> ${{TMPDIR}}/{params.sample}.sam
> {params.tmpdir}/{params.sample}.sam
fi
samtools view -@{threads} -hbuS -o ${{TMPDIR}}/{params.sample}.unsorted.bam ${{TMPDIR}}/{params.sample}.sam
samtools view -@{threads} -hbuS -o {params.tmpdir}/{params.sample}.unsorted.bam {params.tmpdir}/{params.sample}.sam
samtools sort -@{threads} \\
-u \\
--write-index \\
--output-fmt BAM \\
-T ${{TMPDIR}}/{params.sample}.samtoolssort \\
-o ${{TMPDIR}}/{params.sample}.sorted.bam ${{TMPDIR}}/{params.sample}.unsorted.bam
-T {params.tmpdir}/{params.sample}.samtoolssort \\
-o {params.tmpdir}/{params.sample}.sorted.bam {params.tmpdir}/{params.sample}.unsorted.bam
samtools view -@{threads} \\
--output-fmt BAM \\
--write-index \\
-o ${{TMPDIR}}/{params.sample}.unmapped.bam \\
-o {params.tmpdir}/{params.sample}.unmapped.bam \\
-f4 \\
${{TMPDIR}}/{params.sample}.sorted.bam
{params.tmpdir}/{params.sample}.sorted.bam
unmapped2anchors.py \\
${{TMPDIR}}/{params.sample}.unmapped.bam | \\
gzip -c - > ${{TMPDIR}}/{params.sample}.anchors.fastq.gz
{params.tmpdir}/{params.sample}.unmapped.bam | \\
gzip -c - > {params.tmpdir}/{params.sample}.anchors.fastq.gz
mv ${{TMPDIR}}/{params.sample}.anchors.fastq.gz {output.anchorsfq}
mv ${{TMPDIR}}/{params.sample}.unmapped.b* ${{outdir}}/
mv {params.tmpdir}/{params.sample}.anchors.fastq.gz {output.anchorsfq}
mv {params.tmpdir}/{params.sample}.unmapped.b* ${{outdir}}/
sleep 300
rm -rf $TMPDIR
rm -rf {params.tmpdir}
"""


Expand Down
1 change: 0 additions & 1 deletion workflow/rules/create_index.smk
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@ rule create_index:
script1=join(SCRIPTS_DIR, "_add_geneid2genepred.py"),
script2=join(SCRIPTS_DIR, "_multifasta2separatefastas.sh"),
script3=join(SCRIPTS_DIR, "fix_gtfs.py"),
randomstr=str(uuid.uuid4()),
nclscan_config=config["nclscan_config"],
container: config['containers']['star_ucsc_cufflinks']
threads: getthreads("create_index")
Expand Down
Loading

0 comments on commit 9e1608c

Please sign in to comment.