Skip to content

Commit

Permalink
Merge branch 'dev' of https://github.com/PapenfussLab/gridss into dev
Browse files Browse the repository at this point in the history
  • Loading branch information
Daniel Cameron committed Oct 14, 2020
2 parents dc9a25e + 1aafc70 commit 3106f15
Show file tree
Hide file tree
Showing 2 changed files with 54 additions and 48 deletions.
46 changes: 42 additions & 4 deletions Readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,14 +35,48 @@ To run GRIDSS the following must be installed:

* java 1.8 or later
* R 3.6 or later
* `gridss_somatic_filter.R` requires the following R libraries:
* argparser
* tidyverse
* stringdist
* testthat
* stringr
* StructuralVariantAnnotation
* rtracklayer
* BSgenome package for your reference genome (optional)
* samtools
* bwa

The driver script requires:

* bash
* getopt(1) (part of [util-linux](https://en.wikipedia.org/wiki/Util-linux))

To run VIRUSBreakend, kraken2, or repeatmasker annotations, the following additional software must be installed:
* kraken2
* Note that `virusbreakend-build` requires all `kraken2-build` dependencies
* RepeatMasker
* bcftools

# Building gridsstools

Some performance-critical steps are implemented in C using htslib.
A precompiled version of `gridsstools` is included as part of GRIDSS releases.
If this precompiled version does not run on your system you will need to build it from source.

To build `gridsstools` from source run the following:
```
git clone http://github.com/PapenfussLab/gridss/
cd gridss
git submodule init
git submodule update
cd src/main/c/gridsstools/htslib/
autoheader
autoconf
./configure && make
cd ..
autoheader
autoconf
./configure && make all
```

# Running

Scripts and pre-compiled binaries are available at https://github.com/PapenfussLab/GRIDSS/releases. GRIDSS invokes external tools at multiple points during processing. By default this is bwa mem, but can be configured to use bowtie2 or another aligner.
Expand All @@ -53,6 +87,11 @@ The following scripts are included in GRIDSS releases:
|---|---|
gridss.sh|Driver script for running GRIDSS. Use this to run GRIDSS
gridss_somatic_filter.R|Somatic filtering script. Identifies somatic events for tumour samples with a matched normal. Multiple tumour biopsies are supported
gridss_extract_overlapping_fragments.sh|Extracts all alignments for read pairs with at least one aligment overlapping set of regions of interest. Correctly handles supplementary alignments. Use this script to extract reads of interest for targeted GRIDSS variant calling.
gridss_annotate_vcf_repeatmasker.sh|Annotates breakpoint and single breakend inserted sequences with the RepeatMasker classification of the sequence.
gridss_annotate_vcf_kraken2.sh|Annotates breakpoint and single breakend inserted sequences with the Kraken2 classification of the sequence.
virusbreakend.sh|[See VIRUSBreakend README](https://github.com/PapenfussLab/gridss/blob/master/VIRUSBreakend_Readme.md)
virusbreakend-build.sh|[See VIRUSBreakend README](https://github.com/PapenfussLab/gridss/blob/master/VIRUSBreakend_Readme.md)

## gridss.sh command-line arguments

Expand All @@ -72,7 +111,6 @@ argument|description
--maxcoverage|maximum coverage. Regions with coverage in excess of this are ignored. (Default: 50000)
--labels|comma separated labels to use in the output VCF for the input files. Must have same number of entries as there are input files. Input files with the same label are aggregated (useful for multiple sequencing runs of the same sample). Labels default to input filenames, unless a single read group with a non-empty sample name exists in which case the read group sample name is used (which can be disabled by \"useReadGroupSampleNameCategoryLabel=false\" in the configuration file). If labels are specified, they must be specified for all input files.
--steps|processing steps to run. Defaults to all steps. Multiple steps are specified using comma separators. Available steps are preprocess,assemble,call. Useful to improve parallelisation on a cluster as preprocess of each input file is independent, and can be performed in parallel, and has lower memory requirements than the assembly step.
--repeatmaskerbed|bedops rmsk2bed BED file for reference genome. Optional parameter for annotating inserted sequences with RepeatMasker repeat type/class (Optional)
--jobindex|zero-based index of this assembly job node. Used to spread GRIDSS assembly across multiple compute nodes. Use only with `-s assemble`. Once all jobs have completed, a `-s assemble` or `-s all` job should be run to gather the results together.
--jobnodes|total number of assembly jobs scheduled.

Expand Down
56 changes: 12 additions & 44 deletions VIRUSBreakend_Readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,49 +23,18 @@ https://www.biorxiv.org/content/10.1101/2020.07.09.196527v1

# Pre-requisites

To run VIRUSBreakend the following must be installed:
VIRUSBreakend is part of the GRIDSS software suite.

* java 1.8 or later
* R 3.6 or later
* samtools
* bwa
All tools used by VIRUSBreakend must be on `PATH` including:
* java
* GRIDSS
* Kraken2
* RepeatMasker
* htslib 1.10
* GRIDSS

The driver script requires:

* bash
* getopt(1) (part of [util-linux](https://en.wikipedia.org/wiki/Util-linux))

Once
* Ensure GRIDSS, Kraken2, RepeatMasker, samtools and bwa are on `PATH`
* Set the `GRIDSS_JAR` environment variable to the location of the GRIDSS jar file


## gridsstools

Performance-critical steps in VIRUSBreakend are implemented in C using htslib.
A precompiled version of `gridsstools` is included as part of GRIDSS releases.
If this precompiled version does not run on your system you will need to build it from source.

To build `gridsstools` from source run the following:
* samtools
* bcftools
* bwa

```
git clone http://github.com/PapenfussLab/gridss/
cd gridss
git submodule init
git submodule update
cd src/main/c/gridsstools/htslib/
autoheader
autoconf
./configure && make
cd ..
autoheader
autoconf
./configure && make all
```
Set the `GRIDSS_JAR` environment variable to the location of the GRIDSS jar file

## Reference data setup

Expand Down Expand Up @@ -101,11 +70,10 @@ virusbreakend.sh \

# Output

The output format is a VCF file containing the location of single breakend from the viral sequence.
The integration location in the host is encoded in the `BEALN` field.
Note that depending on the host alignment and single breakend orientations, the integration position will be at either the start or end of the `BEALN` alignment position.

In future versions, this is likely to be replaced by a more readable breakpoint `BND` notation.
VIRUSBreakend outputs:
* A VCF containing the integration breakpoints
* The kraken2 report of the virus(es) for which viral integration was run upon
* Coverage statistics of the vvirus(es) for which viral integration was run upon

## Ambigous insertions

Expand Down

0 comments on commit 3106f15

Please sign in to comment.