Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: fusion calling #222

Merged
merged 68 commits into from
Feb 27, 2024
Merged
Show file tree
Hide file tree
Changes from 67 commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
210603c
Initial commit
FelixMoelder Dec 15, 2022
2659312
concat arriba calls
FelixMoelder Dec 19, 2022
d7b814f
generalize workflow
FelixMoelder Jan 13, 2023
f6a1345
Merge branch 'master' into fusion_calling
FelixMoelder Mar 16, 2023
b4b1538
update arriba
FelixMoelder Mar 16, 2023
f499e26
Update tests
FelixMoelder Mar 16, 2023
6bcaa1d
typo
FelixMoelder Mar 16, 2023
788080d
Merge branch 'master' into fusion_calling
FelixMoelder May 10, 2023
273fbdc
intermediate changes
FelixMoelder May 12, 2023
5a24102
Merge branch 'fusion_calling' of github.com:snakemake-workflows/dna-s…
FelixMoelder May 12, 2023
0cb7f25
remove repetetive code
FelixMoelder May 12, 2023
08ae08c
fix typo
FelixMoelder May 12, 2023
1cea901
indexing
FelixMoelder Aug 31, 2023
b5f29fe
Merge branch 'master' into fusion_calling
FelixMoelder Aug 31, 2023
910803f
snakefmt
FelixMoelder Aug 31, 2023
9b141b3
refactoring
FelixMoelder Sep 6, 2023
b27f1f3
fmt
FelixMoelder Sep 6, 2023
0043a9f
fmt
FelixMoelder Sep 6, 2023
6e21bac
fixed incompatibilities
FelixMoelder Sep 7, 2023
92f4434
skip vep
FelixMoelder Sep 11, 2023
e1e9b62
improve report
FelixMoelder Sep 12, 2023
ab94e13
Fix final output
FelixMoelder Sep 15, 2023
d8e27f9
fix output
FelixMoelder Sep 15, 2023
1357b44
handle mutational burden
FelixMoelder Sep 15, 2023
e42caa2
remove unused rules, separate groups
FelixMoelder Sep 15, 2023
b0d24ec
fix minor issues
FelixMoelder Sep 20, 2023
176f681
cleanup
FelixMoelder Sep 20, 2023
b3d574e
Add missing script
FelixMoelder Sep 20, 2023
0201f7f
update readme
FelixMoelder Sep 21, 2023
8f90131
update freebayes
FelixMoelder Sep 22, 2023
333ae5e
reset download revel
FelixMoelder Oct 17, 2023
289315a
Merge branch 'master' into fusion_calling
FelixMoelder Oct 26, 2023
91815b6
renaming
FelixMoelder Oct 27, 2023
cdc8d2d
fix datatype handling
FelixMoelder Oct 27, 2023
3e5a3ba
fmt
FelixMoelder Oct 27, 2023
805df4d
fmt
FelixMoelder Oct 27, 2023
1d8a179
merge master
FelixMoelder Nov 13, 2023
b968321
Update wrapper
FelixMoelder Nov 13, 2023
aa8d20a
Merge branch 'master' into fusion_calling
FelixMoelder Nov 29, 2023
251597d
Merge branch 'master' into fusion_calling
FelixMoelder Nov 30, 2023
b88a90c
Merge branch 'master' into fusion_calling
FelixMoelder Dec 1, 2023
dad77e0
breaking up workflow (not yet working)
FelixMoelder Dec 12, 2023
721d5c9
Merge branch 'fusion_calling' of github.com:snakemake-workflows/dna-s…
FelixMoelder Dec 12, 2023
612c041
fmt
FelixMoelder Dec 12, 2023
aa9e2fa
fmt
FelixMoelder Dec 12, 2023
e536e5e
fmt
FelixMoelder Dec 12, 2023
2b48e51
Merge branch 'master' into fusion_calling
FelixMoelder Dec 12, 2023
d1e2383
invoked datatype and candidate-calling
FelixMoelder Jan 16, 2024
ada0627
fix formatting
FelixMoelder Jan 16, 2024
c71aef4
unified workflow
FelixMoelder Jan 18, 2024
8a15e4e
formatting
FelixMoelder Jan 18, 2024
2bbbb37
update samplesheet
FelixMoelder Jan 18, 2024
479c144
Add read group for star
FelixMoelder Jan 18, 2024
1bd1720
fix param
FelixMoelder Jan 18, 2024
0e4f5c1
support fusions and variants in rna
FelixMoelder Jan 19, 2024
d6f4a9c
feat: render canonical transcript source
FelixMoelder Jan 24, 2024
eb8268f
clean report
FelixMoelder Jan 26, 2024
26c92dd
Merge remote-tracking branch 'origin/feat/render_canonical_source' in…
FelixMoelder Jan 29, 2024
6cbea56
cleanup template
FelixMoelder Feb 13, 2024
bae0416
github action workaround
FelixMoelder Feb 13, 2024
8e8e581
update readme
FelixMoelder Feb 15, 2024
c738abb
cleanup
FelixMoelder Feb 26, 2024
486722a
fmt
FelixMoelder Feb 26, 2024
02a693d
introduce pattern delegatoin
FelixMoelder Feb 26, 2024
2976f30
fmt
FelixMoelder Feb 26, 2024
fca151c
add comment to script
FelixMoelder Feb 26, 2024
6ec5818
typo
FelixMoelder Feb 26, 2024
a02b21f
Update convert_fusions_to_vcf.sh
FelixMoelder Feb 26, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,7 @@ jobs:
with:
directory: .test
snakefile: workflow/Snakefile
args: "--configfile .test/config-simple/config.yaml --report report.zip"
args: "--configfile .test/config-simple/config.yaml --cores 1 --report report.zip"
show-disk-usage-on-error: true

- name: Test workflow (local FASTQs, target regions)
Expand Down
4 changes: 2 additions & 2 deletions .test/config-chm-eval/samples.tsv
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
sample_name group alias platform
chm chm ILLUMINA
sample_name group alias platform datatype calling
chm chm ILLUMINA dna variants
4 changes: 2 additions & 2 deletions .test/config-giab/samples.tsv
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
sample_name alias group platform purity
NA12878 NA12878 NA12878 ILLUMINA
sample_name alias group platform purity datatype calling
NA12878 NA12878 NA12878 ILLUMINA dna variants
6 changes: 3 additions & 3 deletions .test/config-no-candidate-filtering/samples.tsv
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
sample_name group alias platform
a a ILLUMINA
b b ILLUMINA
sample_name group alias platform datatype calling
a a ILLUMINA dna variants
b b ILLUMINA dna variants
10 changes: 5 additions & 5 deletions .test/config-simple/samples.tsv
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
sample_name group alias platform
a one x ILLUMINA
b one y ILLUMINA
b two x ILLUMINA
a two y ILLUMINA
sample_name group alias platform datatype calling
a one x ILLUMINA dna variants
b one y ILLUMINA dna variants
b two x ILLUMINA dna variants
a two y ILLUMINA dna variants
10 changes: 5 additions & 5 deletions .test/config-sra/samples.tsv
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
sample_name group alias platform
PD12A medium_L base ILLUMINA
PD13B medium_L changed ILLUMINA
PD09A soil changed ILLUMINA
PD12A soil base ILLUMINA
sample_name group alias platform datatype calling
PD12A medium_L base ILLUMINA dna variants
PD13B medium_L changed ILLUMINA dna variants
PD09A soil changed ILLUMINA dna variants
PD12A soil base ILLUMINA dna variants
6 changes: 3 additions & 3 deletions .test/config-target-regions/samples.tsv
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
sample_name group alias platform
a a ILLUMINA
b b ILLUMINA
sample_name group alias platform datatype calling
a a ILLUMINA dna variants
b b ILLUMINA dna variants
6 changes: 3 additions & 3 deletions .test/config_primers/samples.tsv
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
sample_name group alias platform
a a ILLUMINA
b b ILLUMINA
sample_name group alias platform datatype calling
a a ILLUMINA dna variants
b b ILLUMINA dna variants
4 changes: 3 additions & 1 deletion config/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,13 @@ To configure this workflow, modify ``config/config.yaml`` according to your need

# Sample sheet

Add samples to `config/samples.tsv`. For each sample, the columns `sample_name`, `alias`, `platform`, and `group` have to be defined.
Add samples to `config/samples.tsv`. For each sample, the columns `sample_name`, `alias`, `platform`, `datatype`, `calling` and `group` have to be defined.
* Samples within the same `group` can be referenced in a joint [Calling scenario](#calling-scenario) via their `alias`es.
* `alias`es represent the name of the sample within its group. They are meant to be some abstract description of the sample type to be used in the [Calling scenario](#calling-scenario), and should thus be used consistently across groups. A classic example would be a combination of the `tumor` and `normal` aliases.
* The `platform` column needs to contain the used sequencing plaform (one of 'CAPILLARY', 'LS454', 'ILLUMINA', 'SOLID', 'HELICOS', 'IONTORRENT', 'ONT', 'PACBIO’).
* The same `sample_name` entry can be used multiple times within a `samples.tsv` sample sheet, with only the value in the `group` column differing between repeated rows. This way, you can use the same sample for variant calling in different groups, for example if you use a panel of normal samples when you don't have matched normal samples for tumor variant calling.
* The `datatype` column specifies what kind of data each sample corresponds to. This can either be `rna` or `dna`.
* The `calling` column sets the kind of analysis to be performed. This can be either `fusions`, `variants` or both (comma separated). Fusion calling is still under developement and should be considered as experimental.

If mutational burdens shall be estimated for a sample, the to be used ``events`` from the calling scenario (see below) have to be specified in an additional column ``mutational_burden_events``. Multiple events have to be separated by commas within that column.

Expand Down
5 changes: 3 additions & 2 deletions config/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ ref:
# Ensembl species name
species: homo_sapiens
# Ensembl release
release: 110
release: 111
# Genome build
build: GRCh38
# Optionally, instead of downloading the whole reference from Ensembl via the
Expand Down Expand Up @@ -121,8 +121,9 @@ calling:
# Add any number of events here to filter for.
# The id of each event can be chosen freely, but needs to contain
# only alphanumerics and underscores
# ("somatic" below is just an example and can be modified as needed).
# ("some_id" below is just an example and can be modified as needed).
some_id:
types: ["variants", "fusions"]
FelixMoelder marked this conversation as resolved.
Show resolved Hide resolved
# labels for the callset, displayed in the report. Will fall back to id if no labels specified
labels:
some-label: label text
Expand Down
2 changes: 1 addition & 1 deletion config/samples.tsv
Original file line number Diff line number Diff line change
@@ -1 +1 @@
sample_name alias group platform purity panel umi_read umi_read_structure
sample_name alias group platform purity panel umi_read umi_read_structure datatype calling
1 change: 1 addition & 0 deletions workflow/Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ include: "rules/table.smk"
include: "rules/regions.smk"
include: "rules/plugins.smk"
include: "rules/datavzrd.smk"
include: "rules/fusion_calling.smk"
include: "rules/testcase.smk"


Expand Down
5 changes: 5 additions & 0 deletions workflow/envs/arriba.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
channels:
- conda-forge
- bioconda
dependencies:
- arriba =2.4
2 changes: 1 addition & 1 deletion workflow/envs/bcftools.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@ channels:
- conda-forge
- bioconda
dependencies:
- bcftools =1.14
- bcftools =1.16
2 changes: 1 addition & 1 deletion workflow/envs/pandas.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,5 @@ channels:
- conda-forge
- bioconda
dependencies:
- pandas =1.4
- pandas =2.1
- python =3.10
2 changes: 1 addition & 1 deletion workflow/report/workflow.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
This workflow generates annotated variant calls that can be viewed in interactive reports, showing all evidence levels provided by Varlociraptor_.
Adapters were removed with Cutadapt_. Reads were mapped with `BWA MEM`_, PCR and optical duplicates were removed with Picard_.
Candidate variant discovery was performed with Freebayes_ and Delly_. Statisticall assessment of variants was conducted with Varlociraptor_.
Variant calling results, sorted by type, event, and impact can be found under `Variant calls`_.
Fusion resp. variant calling results, sorted by type, event, and impact can be found under Fusion/Variant calls.
The corresponding Varlociraptor_ scenarios, containing the detailed definition of events can be found unter `Variant calling scenarios`_.

.. _Varlociraptor: https://varlociraptor.github.io
Expand Down
57 changes: 57 additions & 0 deletions workflow/resources/datavzrd/fusion-calls-template.datavzrd.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
name: ?f"Fusion calls {wildcards.event}"

default-view: ?f"{params.groups[0]}-fusions"

__definitions__:
- import os
- |
def read_file(path):
return open(path, 'r').read()

datasets:
?for group, path in zip(params.groups, params.fusion_calls):
?f"{group}-fusions":
path: ?path
separator: "\t"

views:
?for group in params.groups:
?f"{group}-fusions":
desc: ?f"Fusion calls.\n{config['calling']['fdr-control']['events'][wildcards.event]['desc']}"
dataset: ?f"{group}-fusions"
render-table:
columns:
"regex('.+: allele frequency')":
plot:
ticks:
scale: "linear"
domain: [0.0, 1.0]
aux-domain-columns:
- "regex('.+: allele frequency')"
"regex('.+: read depth')":
plot:
ticks:
scale: "linear"
aux-domain-columns:
- "regex('.+: read depth')"
"regex('prob: .+')":
plot:
heatmap:
scale: linear
domain: [0.0, 1.0]
range:
- white
- "#1f77b4"
?for alias in params.samples.loc[params.samples["group"] == group, "alias"]:
'?f"{alias}: short observations"':
optional: true
custom-plot:
data: ?read_file(params.data_short_observations)
spec: ?read_file(params.spec_short_observations)
display-mode: detail
'?f"{alias}: observations"':
optional: true
custom-plot:
data: ?read_file(params.data_observations)
spec-path: ?params.spec_observations
display-mode: detail
61 changes: 61 additions & 0 deletions workflow/resources/datavzrd/variant-calls-template.datavzrd.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,19 @@ __definitions__:
return `https://${{build}}${{url_suffix}}${{hgvsg}}`
}}
"""
- |
protein_id = f"""
function(row) {{
let protein_id = row.hgvsp.split(':')[0]
return protein_id
}}
"""
- |
empty_content = f"""
function(row) {{
return ""
}}
"""

datasets:
?if input.variant_oncoprints:
Expand Down Expand Up @@ -290,6 +303,28 @@ views:
display-mode: detail
protein alteration (short):
display-mode: detail
canonical:
optional: true
display-mode: detail
plot:
heatmap:
scale: "ordinal"
domain: ["", "True"]
range:
- white
- black
custom-content: ?empty_content
mane_plus_clinical:
optional: true
display-mode: detail
plot:
heatmap:
scale: "ordinal"
domain: ["", "True"]
range:
- white
- black
custom-content: ?empty_content
?for alias in params.samples.loc[params.samples["group"] == group, "alias"]:
'?f"{alias}: short observations"':
optional: true
Expand All @@ -310,6 +345,10 @@ views:
query_genomenexus:
value: ?genomenexus_link
display-mode: hidden
ensembl_protein_id:
value: ?protein_id
display-mode: detail




Expand Down Expand Up @@ -404,6 +443,28 @@ views:
display-mode: detail
alternative allele:
display-mode: detail
canonical:
optional: true
display-mode: detail
plot:
heatmap:
scale: "ordinal"
domain: ["", "True"]
range:
- white
- black
custom-content: ?empty_content
mane_plus_clinical:
optional: true
display-mode: detail
plot:
heatmap:
scale: "ordinal"
domain: ["", "True"]
range:
- white
- black
custom-content: ?empty_content
?for alias in params.samples.loc[params.samples["group"] == group, "alias"]:
'?f"{alias}: short observations"':
optional: true
Expand Down
16 changes: 8 additions & 8 deletions workflow/rules/annotation.smk
Original file line number Diff line number Diff line change
Expand Up @@ -19,21 +19,21 @@ rule annotate_candidate_variants:
"benchmarks/vep/{group}.{caller}.{scatteritem}.annotate_candidates.tsv"
threads: get_vep_threads()
wrapper:
"v2.5.0/bio/vep/annotate"
"v3.3.5/bio/vep/annotate"


rule annotate_variants:
input:
calls="results/calls/{group}.{scatteritem}.bcf",
calls="results/calls/{group}.{calling_type}.{scatteritem}.bcf",
cache="resources/vep/cache",
plugins="resources/vep/plugins",
revel=lambda wc: get_plugin_aux("REVEL"),
revel_tbi=lambda wc: get_plugin_aux("REVEL", True),
fasta=genome,
fai=genome_fai,
output:
calls="results/calls/{group}.{scatteritem}.annotated.bcf",
stats="results/calls/{group}.{scatteritem}.stats.html",
calls="results/calls/{group}.{calling_type}.{scatteritem}.annotated.bcf",
stats="results/calls/{group}.{calling_type}.{scatteritem}.stats.html",
params:
# Pass a list of plugins to use, see https://www.ensembl.org/info/docs/tools/vep/script/vep_plugins.html
# Plugin args can be added as well, e.g. via an entry "MyPlugin,1,FOO", see docs.
Expand All @@ -42,10 +42,10 @@ rule annotate_variants:
config["annotations"]["vep"]["final_calls"]["params"]
),
log:
"logs/vep/{group}.{scatteritem}.annotate.log",
"logs/vep/{group}.{calling_type}.{scatteritem}.annotate.log",
threads: get_vep_threads()
wrapper:
"v2.5.0/bio/vep/annotate"
"v3.3.5/bio/vep/annotate"


# TODO What about multiple ID Fields?
Expand Down Expand Up @@ -89,9 +89,9 @@ rule gather_annotated_calls:
calls=get_gather_annotated_calls_input(),
idx=get_gather_annotated_calls_input(ext="bcf.csi"),
output:
"results/final-calls/{group}.annotated.bcf",
"results/final-calls/{group}.{calling_type}.annotated.bcf",
log:
"logs/gather-annotated-calls/{group}.log",
"logs/gather-annotated-calls/{group}.{calling_type}.log",
params:
extra="-a",
wrapper:
Expand Down
Loading
Loading