Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RNA-Seq germline variant calling pipeline #123

Merged
merged 57 commits into from
May 3, 2024
Merged
Show file tree
Hide file tree
Changes from 47 commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
559d961
feat: RNA-Seq variant calling pipeline
adthrasher Nov 28, 2023
75a8a4c
fix: update docker
adthrasher Nov 28, 2023
bdf078a
fix: split code to bash
adthrasher Nov 29, 2023
a77e7d7
fix: updates from test runs
adthrasher Dec 1, 2023
78c38a1
feat: GATK reference generation pipeline
adthrasher Dec 1, 2023
b5c2386
feat: Picard CreateSequenceDictionary
adthrasher Dec 1, 2023
693d646
feat: samtools faidx
adthrasher Dec 1, 2023
6f91ebf
feat: sambamba WDL + tests
adthrasher Dec 1, 2023
cc14f5c
fix: update mem. require output name
adthrasher Dec 7, 2023
34746da
fix: sambamba
adthrasher Dec 7, 2023
641147a
Adding documentation
adthrasher Dec 8, 2023
6b2c97d
Add tool documentation
adthrasher Apr 9, 2024
2459cf7
Add Picard tests
adthrasher Apr 9, 2024
851834d
dbsnp is not supported yet by HaplotypeCallerSpark
adthrasher Apr 15, 2024
663a9cf
Add additional tests
adthrasher Apr 15, 2024
c5b58d2
Update test bam to ensure pairings
adthrasher Apr 15, 2024
8bb2de8
Remove --copy-input-files
adthrasher Apr 19, 2024
2a48a1d
GATK4 tests
adthrasher Apr 19, 2024
9503995
Revert to non-Spark HaplotypeCaller
adthrasher Apr 23, 2024
c394d6b
Metadata updates
adthrasher Apr 23, 2024
f4cc9d4
Merge branch 'main' into rnaseq_variant
adthrasher Apr 23, 2024
d5dbcee
Correct parameter
adthrasher Apr 23, 2024
1a20230
Remove unneeded param
adthrasher Apr 23, 2024
2d4ea80
Remove extraneous file
adthrasher Apr 23, 2024
423a0a4
Add --versbose
adthrasher Apr 23, 2024
1d6fe78
Add file outputs
adthrasher Apr 23, 2024
c1ffb5d
Apply review feedback
adthrasher Apr 23, 2024
80dddbd
Apply suggestions from review
adthrasher Apr 23, 2024
cff1ed6
Update tools/gatk4.wdl
adthrasher Apr 23, 2024
396de4c
Apply feedback from PR
adthrasher Apr 23, 2024
30da3fc
Remove extra line
adthrasher Apr 23, 2024
88dfea8
Rename output based on PR feedback
adthrasher Apr 23, 2024
1e2617d
Changes based on PR feedback
adthrasher Apr 24, 2024
76b8a4d
Apply feedback from PR review
adthrasher Apr 24, 2024
fb8023a
Apply suggestions from PR
adthrasher Apr 24, 2024
3da3850
Update tests to match
adthrasher Apr 24, 2024
f933e96
Remove unfiltered VCF output
adthrasher Apr 24, 2024
767223e
Update metadata
adthrasher Apr 24, 2024
d460131
Make duplicate marking optional
adthrasher Apr 24, 2024
949f200
style: fix input: location
adthrasher Apr 24, 2024
c124039
Apply suggestions from PR
adthrasher Apr 24, 2024
7493096
Apply suggestions from code review
adthrasher Apr 24, 2024
409dc45
Expose Java heap settings
adthrasher Apr 24, 2024
1b759bb
Fix whitespace
adthrasher Apr 25, 2024
a0ce5a5
Merge branch 'main' into rnaseq_variant
adthrasher May 3, 2024
d2ce372
docs: Add commas to objects to satisfy lint check
adthrasher May 3, 2024
017fee3
doc: fix CI issues
adthrasher May 3, 2024
bb29101
docs: fix metadata entries
adthrasher May 3, 2024
9885e19
docs: remove unused meta
adthrasher May 3, 2024
def6885
docs: add commas to objects
adthrasher May 3, 2024
8d1ec35
docs: fix metadata lint issues
adthrasher May 3, 2024
666daa5
chore: apply feedback from PR
adthrasher May 3, 2024
f2ec69a
Update tools/picard.wdl
adthrasher May 3, 2024
f9eafe2
chore: memory consistency
adthrasher May 3, 2024
8724c1b
chore: apply PR suggestions
adthrasher May 3, 2024
6065bc9
docs: remove unnecessary meta
adthrasher May 3, 2024
f7be874
docs: fixing CI warnings
adthrasher May 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
Expand Up @@ -21,3 +21,5 @@ bin/check-job-alive !text !filter !merge !diff
*.py !text !filter !merge !diff
*.pl !text !filter !merge !diff
*.conf !text !filter !merge !diff
.fa filter=lfs diff=lfs merge=lfs -text
*.fa filter=lfs diff=lfs merge=lfs -text
3,397 changes: 3,397 additions & 0 deletions tests/tools/input/1scattered.interval_list

Large diffs are not rendered by default.

3 changes: 3 additions & 0 deletions tests/tools/input/GRCh38.chr1_chr19.dict
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
@HD VN:1.0 SO:unsorted
@SQ SN:chr1 LN:248956422 M5:6aef897c3d6ff0c78aff06ac189178dd
@SQ SN:chr19 LN:58617616 M5:85f9f4fc152c58cb7913c06d6b98573a
3 changes: 3 additions & 0 deletions tests/tools/input/GRCh38.chr1_chr19.fa
Git LFS file not shown
2 changes: 2 additions & 0 deletions tests/tools/input/GRCh38.chr1_chr19.fa.fai
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
chr1 248956422 112 70 71
chr19 58617616 252513180 70 71
5,000 changes: 5,000 additions & 0 deletions tests/tools/input/Homo_sapiens_assembly38.dbsnp138.top5000.vcf

Large diffs are not rendered by default.

Binary file not shown.
Git LFS file not shown
Binary file not shown.
19 changes: 19 additions & 0 deletions tests/tools/input/chr1.interval_list
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
@HD VN:1.5 SO:coordinate
@SQ SN:chr1 LN:248956422 M5:6aef897c3d6ff0c78aff06ac189178dd AS:38 UR:/seq/references/Homo_sapiens_assembly38/v0/Homo_sapiens_assembly38.fasta SP:Homo sapiens
@SQ SN:chr19 LN:58617616 M5:85f9f4fc152c58cb7913c06d6b98573a AS:38 UR:/seq/references/Homo_sapiens_assembly38/v0/Homo_sapiens_assembly38.fasta SP:Homo sapiens
chr1 10001 207666 + . intersection ACGTmer
chr1 257667 297968 + . intersection ACGTmer
chr1 347969 535988 + . intersection ACGTmer
chr1 585989 2702781 + . intersection ACGTmer
chr1 2746291 12954384 + . intersection ACGTmer
chr1 13004385 16799163 + . intersection ACGTmer
chr1 16849164 29552233 + . intersection ACGTmer
chr1 29553836 121976459 + . intersection ACGTmer
chr1 122026460 124977944 + . intersection ACGTmer
chr1 124978327 125130246 + . intersection ACGTmer
chr1 125131848 125171347 + . intersection ACGTmer
chr1 125173584 125184587 + . intersection ACGTmer
chr1 143184588 223558935 + . intersection ACGTmer
chr1 223608936 228558364 + . intersection ACGTmer
chr1 228608365 248946422 + . intersection ACGTmer
chr19 20002 208000 + . intersection ACGTmer
3 changes: 3 additions & 0 deletions tests/tools/input/test.bam.bai
Git LFS file not shown
3 changes: 3 additions & 0 deletions tests/tools/input/test.fa
Git LFS file not shown
3 changes: 3 additions & 0 deletions tests/tools/input/test1.vcf.gz
Git LFS file not shown
Binary file added tests/tools/input/test1.vcf.gz.tbi
Binary file not shown.
3 changes: 3 additions & 0 deletions tests/tools/input/test2.vcf.gz
Git LFS file not shown
Binary file added tests/tools/input/test2.vcf.gz.tbi
Binary file not shown.
3 changes: 3 additions & 0 deletions tests/tools/input/test_rnaseq_variant.bam
Git LFS file not shown
3 changes: 3 additions & 0 deletions tests/tools/input/test_rnaseq_variant.bam.bai
Git LFS file not shown
638 changes: 638 additions & 0 deletions tests/tools/input/test_rnaseq_variant.recal.txt

Large diffs are not rendered by default.

3,724 changes: 3,724 additions & 0 deletions tests/tools/input/wgs_calling_regions.hg38.interval_list

Large diffs are not rendered by default.

5 changes: 5 additions & 0 deletions tests/tools/input_json/gatk4_apply_bqsr.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{
"bam": "tests/tools/input/test_rnaseq_variant.bam",
"bam_index": "tests/tools/input/test_rnaseq_variant.bam.bai",
"recalibration_report": "tests/tools/input/test_rnaseq_variant.recal.txt"
}
11 changes: 11 additions & 0 deletions tests/tools/input_json/gatk4_base_recalibrator.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
{
"bam": "tests/tools/input/test_rnaseq_variant.bam",
"bam_index": "tests/tools/input/test_rnaseq_variant.bam.bai",
"fasta": "tests/tools/input/GRCh38.chr1_chr19.fa",
"fasta_index": "tests/tools/input/GRCh38.chr1_chr19.fa.fai",
"dict": "tests/tools/input/GRCh38.chr1_chr19.dict",
"dbSNP_vcf":"tests/tools/input/Homo_sapiens_assembly38.dbsnp138.top5000.vcf",
"dbSNP_vcf_index": "tests/tools/input/Homo_sapiens_assembly38.dbsnp138.top5000.vcf.idx",
"known_indels_sites_VCFs": ["tests/tools/input/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz"],
"known_indels_sites_indices": ["tests/tools/input/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz.tbi"]
}
10 changes: 10 additions & 0 deletions tests/tools/input_json/gatk4_haplotype_caller.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
{
"bam": "tests/tools/input/test_rnaseq_variant.bam",
"bam_index": "tests/tools/input/test_rnaseq_variant.bam.bai",
"fasta": "tests/tools/input/GRCh38.chr1_chr19.fa",
"fasta_index": "tests/tools/input/GRCh38.chr1_chr19.fa.fai",
"dict": "tests/tools/input/GRCh38.chr1_chr19.dict",
"dbSNP_vcf": "tests/tools/input/Homo_sapiens_assembly38.dbsnp138.top5000.vcf",
"dbSNP_vcf_index": "tests/tools/input/Homo_sapiens_assembly38.dbsnp138.top5000.vcf.idx",
"interval_list": "tests/tools/input/chr1.interval_list"
}
9 changes: 9 additions & 0 deletions tests/tools/input_json/gatk4_split_n_cigar_reads.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
{
"bam": "tests/tools/input/test.bam",
"bam_index": "tests/tools/input/test.bam.bai",
"fasta": "tests/tools/input/GRCh38.chr1_chr19.fa",
"fasta_index": "tests/tools/input/GRCh38.chr1_chr19.fa.fai",
"dict": "tests/tools/input/GRCh38.chr1_chr19.dict",
"prefix": "split",
"interval_list": "tests/tools/input/wgs_calling_regions.hg38.interval_list"
}
7 changes: 7 additions & 0 deletions tests/tools/input_json/gatk4_variant_filtration.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{
"vcf": "tests/tools/input/test1.vcf.gz",
"vcf_index": "tests/tools/input/test1.vcf.gz.tbi",
"fasta": "tests/tools/input/GRCh38.chr1_chr19.fa",
"fasta_index": "tests/tools/input/GRCh38.chr1_chr19.fa.fai",
"dict": "tests/tools/input/GRCh38.chr1_chr19.dict"
}
5 changes: 5 additions & 0 deletions tests/tools/input_json/picard_merge_vcfs.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{
"vcfs": ["tests/tools/input/test1.vcf.gz", "tests/tools/input/test2.vcf.gz"],
"vcfs_indexes": ["tests/tools/input/test1.vcf.gz.tbi", "tests/tools/input/test2.vcf.gz.tbi"],
"output_vcf_name": "test.vcf.gz"
}
4 changes: 4 additions & 0 deletions tests/tools/input_json/sambamba_merge.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"bams": ["tests/tools/input/possorted_genome_bam.bam", "tests/tools/input/test.bwa_aln_pe.bam"],
"prefix": "test"
}
49 changes: 49 additions & 0 deletions tests/tools/test_gatk4.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
- name: gatk4_split_n_cigar_reads
tags:
- miniwdl
- gatk4
command: >-
miniwdl run --verbose -d test-output/. --task split_n_cigar_reads -i tests/tools/input_json/gatk4_split_n_cigar_reads.json tools/gatk4.wdl
files:
- path: test-output/out/split_n_reads_bam/split.bam
- path: test-output/out/split_n_reads_bam_index/split.bam.bai
- path: test-output/out/split_n_reads_bam_md5/split.bam.md5

- name: gatk4_base_recalibrator
tags:
- miniwdl
- gatk4
command: >-
miniwdl run --verbose -d test-output/. --task base_recalibrator -i tests/tools/input_json/gatk4_base_recalibrator.json tools/gatk4.wdl
files:
- path: test-output/out/recalibration_report/test_rnaseq_variant.recal.txt

- name: gatk4_apply_bqsr
tags:
- miniwdl
- gatk4
command: >-
miniwdl run --verbose -d test-output/. --task apply_bqsr -i tests/tools/input_json/gatk4_apply_bqsr.json tools/gatk4.wdl
files:
- path: test-output/out/recalibrated_bam/test_rnaseq_variant.bqsr.bam
- path: test-output/out/recalibrated_bam_index/test_rnaseq_variant.bqsr.bam.bai


- name: gatk4_haplotype_caller
tags:
- miniwdl
- gatk4
command: >-
miniwdl run --verbose -d test-output/. --task haplotype_caller -i tests/tools/input_json/gatk4_haplotype_caller.json tools/gatk4.wdl
files:
- path: test-output/out/vcf/test_rnaseq_variant.vcf.gz

- name: gatk4_variant_filtration
tags:
- miniwdl
- gatk4
command: >-
miniwdl run --verbose -d test-output/. --task variant_filtration -i tests/tools/input_json/gatk4_variant_filtration.json tools/gatk4.wdl
files:
- path: test-output/out/vcf_filtered/test1.filtered.vcf.gz
- path: test-output/out/vcf_filtered_index/test1.filtered.vcf.gz.tbi
29 changes: 29 additions & 0 deletions tests/tools/test_picard.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -105,3 +105,32 @@
files:
- path: test-output/out/quality_score_distribution_txt/test.bwa_aln_pe.QualityScoreDistribution.txt
- path: test-output/out/quality_score_distribution_pdf/test.bwa_aln_pe.QualityScoreDistribution.pdf

- name: picard_merge_vcfs
tags:
- miniwdl
- picard
command: >-
miniwdl run -d test-output/. --task merge_vcfs -i tests/tools/input_json/picard_merge_vcfs.json tools/picard.wdl
files:
- path: test-output/out/output_vcf/test.vcf.gz

- name: picard_scatter_interval_list
tags:
- miniwdl
- picard
command: >-
miniwdl run -d test-output/. --task scatter_interval_list tools/picard.wdl interval_list="tests/tools/input/wgs_calling_regions.hg38.interval_list" scatter_count=3
files:
- path: test-output/out/interval_lists_scatter/0/1scattered.interval_list
- path: test-output/out/interval_lists_scatter/1/2scattered.interval_list
- path: test-output/out/interval_lists_scatter/2/3scattered.interval_list

- name: picard_create_sequence_dictionary
tags:
- miniwdl
- picard
command: >-
miniwdl run -d test-output/. --task create_sequence_dictionary tools/picard.wdl fasta="tests/tools/input/GRCh38.chrY_chrM.fa.gz" outfile_name="GRCh38.chrY_chrM.dict"
files:
- path: test-output/out/dictionary/GRCh38.chrY_chrM.dict
46 changes: 46 additions & 0 deletions tests/tools/test_sambamba.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
- name: sambamba_index
tags:
- miniwdl
- sambamba
command: >-
miniwdl run -d test-output/. --task index tools/sambamba.wdl bam="tests/tools/input/test.bwa_aln_pe.bam"
files:
- path: test-output/out/bam_index/test.bwa_aln_pe.bam.bai

- name: sambamba_merge
tags:
- miniwdl
- sambamba
command: >-
miniwdl run -d test-output/. --task merge -i tests/tools/input_json/sambamba_merge.json tools/sambamba.wdl
files:
- path: test-output/out/merged_bam/test.bam

- name: sambamba_sort
tags:
- miniwdl
- sambamba
command: >-
miniwdl run -d test-output/. --task sort tools/sambamba.wdl bam="tests/tools/input/test.bwa_aln_pe.bam"
files:
- path: test-output/out/sorted_bam/test.bwa_aln_pe.sorted.bam

- name: sambamba_flagstat
tags:
- miniwdl
- sambamba
command: >-
miniwdl run -d test-output/. --task flagstat tools/sambamba.wdl bam="tests/tools/input/test.bwa_aln_pe.bam"
files:
- path: test-output/out/flagstat_report/test.bwa_aln_pe.flagstat.txt

- name: sambamba_markdup
tags:
- miniwdl
- sambamba
command: >-
miniwdl run -d test-output/. --task markdup tools/sambamba.wdl bam="tests/tools/input/test.bwa_aln_pe.bam"
files:
- path: test-output/out/duplicate_marked_bam/test.bwa_aln_pe.markdup.bam
- path: test-output/out/duplicate_marked_bam_index/test.bwa_aln_pe.markdup.bam.bai
- path: test-output/out/markdup_log/test.bwa_aln_pe.markdup_log.txt
9 changes: 9 additions & 0 deletions tests/tools/test_samtools.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -82,3 +82,12 @@
- path: test-output/out/collated_bam/test.bwa_aln_pe.collated.bam
- path: test-output/out/read_one_fastq_gz/test.bwa_aln_pe.R1.fastq.gz
- path: test-output/out/read_two_fastq_gz/test.bwa_aln_pe.R2.fastq.gz

- name: samtools_faidx
tags:
- miniwdl
- samtools
command: >-
miniwdl run -d test-output/. --task faidx tools/samtools.wdl fasta="tests/tools/input/test.fa"
files:
- path: test-output/out/fasta_index/test.fa.fai
Loading
Loading