Skip to content

Commit

Permalink
Merge branch 'release/0.4.0-alpha'
Browse files Browse the repository at this point in the history
  • Loading branch information
Daniel Cooke committed Jul 4, 2018
2 parents d24e72a + 7ca479a commit 18750d3
Show file tree
Hide file tree
Showing 474 changed files with 25,410 additions and 5,097 deletions.
22 changes: 22 additions & 0 deletions .github/ISSUE_TEMPLATE/bug_report.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
---
name: Bug report
about: Create a report to help us improve

---

**Describe the bug**
A clear and concise description of what the bug is.

**Command**
Command line to run octopus:
```shell
$ octopus
```

**Desktop (please complete the following information):**
- OS: [e.g. OSX High Sierra]
- Version [e.g. v0.3.3-alpha]
- Reference [e.g. hg19]

**Additional context**
Add any other context about the problem here.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ CTestTestfile.cmake

build/src
build/test
resources/forests

## Core latex/pdflatex auxiliary files:
*.aux
Expand Down
6 changes: 3 additions & 3 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,7 @@ install:
- if [[ "$TRAVIS_OS_NAME" == "linux" ]]; then
sudo apt-get install python3 -qy;
else
brew install python3;
brew upgrade python;
fi

############################################################################
Expand All @@ -187,7 +187,7 @@ install:
git clone https://github.com/samtools/htslib.git;
cd htslib && autoheader && autoconf && ./configure && make && sudo make install;
else
brew tap homebrew/science && brew install htslib;
brew install htslib;
fi

before_script:
Expand All @@ -197,7 +197,7 @@ before_script:
- echo "BOOST_ROOT = " ${BOOST_ROOT};

script:
- ./install.py --cxx_compiler=${COMPILER}
- ./scripts/install.py --cxx_compiler=${COMPILER}

notifications:
email: false
13 changes: 13 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
cmake_minimum_required(VERSION 3.9)

set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${CMAKE_SOURCE_DIR}/build/cmake/modules/")

include(CheckIPOSupported)

project(octopus)
Expand All @@ -23,6 +25,17 @@ else()
message(WARNING "You are using an unsupported compiler! Compilation has only been tested with Clang and GCC.")
endif()

set(default_build_type "Release")

if(NOT CMAKE_BUILD_TYPE AND NOT CMAKE_CONFIGURATION_TYPES)
message(STATUS "Setting build type to '${default_build_type}' as none was specified.")
set(CMAKE_BUILD_TYPE "${default_build_type}" CACHE
STRING "Choose the type of build." FORCE)
# Set the possible values of build type for cmake-gui
set_property(CACHE CMAKE_BUILD_TYPE PROPERTY STRINGS
"Debug" "Release" "MinSizeRel" "RelWithDebInfo")
endif()

message("-- Build type is " ${CMAKE_BUILD_TYPE})

# for the main octopus executable
Expand Down
1 change: 1 addition & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -61,5 +61,6 @@ WORKDIR /tmp
RUN git clone -b master https://github.com/luntergroup/octopus.git
WORKDIR /tmp/octopus
RUN ./install.py --root --threads=2
RUN ldconfig

WORKDIR /home
126 changes: 84 additions & 42 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,25 @@
[![Build Status](https://travis-ci.org/luntergroup/octopus.svg?branch=master)](https://travis-ci.org/luntergroup/octopus)
[![MIT license](http://img.shields.io/badge/license-MIT-brightgreen.svg)](http://opensource.org/licenses/MIT)
[![Gitter](https://badges.gitter.im/octopus-caller/Lobby.svg)](https://gitter.im/octopus-caller/Lobby?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge)
[![Anaconda-Server Badge](https://anaconda.org/bioconda/octopus/badges/installer/conda.svg)](https://conda.anaconda.org/bioconda)

---

**Warning: this project is incomplete - it may be unstable and contain bugs.**

---

Octopus is a mapping-based variant caller that implements several calling models within a unified haplotype-aware framework. Octopus explicitly stores allele phasing infomation which allows haplotypes to be dynamically excluded and extended. Primarily this means octopus can jointly consider allele sets far exceeding the cardinality of other approaches, but perhaps more importantly, it allows *marginalisation* over posterior distributions in haplotype space at specific loci. In practise this means octopus can achieve far greater allelic genotyping accuracy than other methods, but can also infer conditional or unconditional phase probabilities directly from genotype probability distributions. This allows octopus to report consistent allele event level variant calls *and* independent phase information.
Octopus is a mapping-based variant caller that implements several calling models within a unified haplotype-aware framework. Octopus takes inspiration from particle filtering by constructing a tree of haplotypes and dynamically pruning and extending the tree based on haplotype posterior probabilities in a sequential manner. This allows octopus to implicitly consider all possible haplotypes at a given loci in reasonable time.

There are currently five calling models implemented:

- **individual**: call germline variants in a single healthy individual.
- **population**: jointly call germline variants in small cohorts.
- **cancer**: call germline and somatic mutations tumour samples.
- **trio**: call germline and _de novo_ mutations in a parent-offspring trio.
- **polyclone**: call variants in samples with an unknown mixture of haploid clones, such a bacteria or viral samples.

Octopus is currently able to call SNVs, small-medium sized indels, small complex rearrangements, and micro-inversions.

## Requirements
* A C++14 compiler with SSE2 support
Expand All @@ -22,7 +33,7 @@ Octopus is a mapping-based variant caller that implements several calling models
* Optional:
* Python3 or greater

#### *Obtaining requirements on OS X*
#### Obtaining requirements on OS X

On OS X, Clang is recommended. All requirements can be installed using the package manager [Homebrew](http://brew.sh/index.html):

Expand All @@ -39,7 +50,7 @@ $ brew install python3

Note if you already have any of these packages installed via Homebrew on your system the command will fail, but you can update to the latest version using `brew upgrade`.

#### *Obtaining requirements on Ubuntu*
#### Obtaining requirements on Ubuntu

Depending on your Ubuntu distribution, some requirements can be installed with `apt-get`. It may be preferable to use GCC as this will simplify installing Boost:

Expand All @@ -61,67 +72,70 @@ These instructions are replicated in the [user documentation](https://github.com

## Installation

Octopus can be built and installed on a wide range of operating systems including most Unix based systems (Linux, OS X) and Windows (once MSVC is C++14 feature complete).
Octopus can be built and installed on most Unix based systems (Linux, OS X). Windows has not been tested, but should be compatible.

#### *Quick installation with Python3*
#### Conda package

Installing octopus first requires obtaining a copy the source code. In the command line, move to an appropriate install directory and execute:
Octopus is available [pre-built for Linux](https://anaconda.org/bioconda/octopus) as part of [Bioconda](https://bioconda.github.io/). To [install in an isolated environment](https://bioconda.github.io/#using-bioconda):

```shell
$ git clone https://github.com/luntergroup/octopus.git && cd octopus
```
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b -p venv
venv/bin/conda install -c conda-forge -c bioconda octopus
venv/bin/octopus -h

A package will also be available for OSX once conda-forge and bioconda move to newer versions of gcc and boost.

The default branch is develop, which is not always stable. You may prefer to switch to the master branch which always has the latest release:
#### Quick installation with Python3

First clone the git repository in your preferred directory:

```shell
$ git checkout master
$ git clone -b master https://github.com/luntergroup/octopus.git && cd octopus
```

Installation is easy using the Python3 install script. If your default compiler satisfies the minimum requirements just execute:
The easiest way to install octopus from source is with the Python3 install script. If your default compiler satisfies the minimum requirements just execute:

```shell
$ ./install.py
$ ./scripts/install.py
```

otherwise explicitly specify the compiler to use:

```shell
$ ./install.py --cxx_compiler /path/to/cpp/compiler # or just the compiler name if on your PATH
$ ./scripts/install.py --cxx_compiler /path/to/cpp/compiler # or just the compiler name if on your PATH
```

For example, if the requirement instructions above were used:

```shell
$ ./install.py --cxx_compiler clang++-4.0
$ ./scripts/install.py --cxx_compiler clang++-4.0
```

On some systems, you may also need to specify a C compiler which is the same version as your C++ compiler, otherwise you'll get lots of link errors. This can be done with the `--c_compiler` option:

```shell
$ ./install.py --cxx_compiler g++-7 --c_compiler gcc-7
$ ./scripts/install.py -cxx g++-7 -c gcc-7
```

By default this installs to `/bin` relative to where you installed octopus. To install to a root directory (e.g. `/usr/local/bin`) use:

```shell
$ ./install.py --root
$ ./scripts/install.py --root
```

this may prompt you to enter a `sudo` password.

If anything goes wrong with the build process and you need to start again, be sure to add the command `--clean`!
If anything goes wrong with the build process and you need to start again, be sure to add the command `--clean`.

#### *Installing with CMake*
#### Installing with CMake

If Python3 isn't available, the binaries can be installed manually with [CMake](https://cmake.org):

```shell
$ git clone https://github.com/luntergroup/octopus.git
$ git clone -b master https://github.com/luntergroup/octopus.git
$ cd octopus/build
$ cmake .. && make install
```

By default this installs to the `/bin` directory where octopus was installed. To install to root (e.g. `/usr/local/bin`) use the `-D` option:
To install to root (e.g. `/usr/local/bin`) use the `-D` option:

```shell
$ cmake -DINSTALL_ROOT=ON ..
Expand All @@ -136,7 +150,7 @@ $ cmake -D CMAKE_C_COMPILER=clang-4.0 -D CMAKE_CXX_COMPILER=clang++-4.0 ..
You can check installation was successful by executing the command:

```shell
$ octopus --help
$ octopus -h
```

## Running Tests
Expand All @@ -149,11 +163,11 @@ $ test/install.py

## Examples

Here are some common use-cases to get started. These examples are by no means exhaustive, please consult the documentation for explanations of all options, algorithms, and further examples. For a more in depth example, refer to the [whole genome germline calling case study](https://github.com/luntergroup/octopus/blob/master/doc/octopus_wgs_case_study.md).
Here are some common use-cases to get started. These examples are by no means exhaustive, please consult the documentation for explanations of all options, algorithms, and further examples. For more in depth examples, refer to the [case studies](https://github.com/luntergroup/octopus/wiki/Case-studies).

Note by default octopus will output all calls in VCF format to standard output, in order to write calls to a file (`.vcf`, `.vcf.gz`, and `.bcf` are supported), use the command line option `--output` (`-o`).

#### *Calling germline variants in an individual*
#### Calling germline variants in an individual

This is the simplest case, if the file `NA12878.bam` contains a single sample, octopus will default to its individual calling model:

Expand All @@ -173,35 +187,37 @@ By default, octopus automatically detects and calls all samples contained in the
$ octopus -R hs37d5.fa -I multi-sample.bam -S NA12878
```

#### *Targeted calling*
#### Targeted calling

By default, octopus will call all possible regions (as specified in the reference FASTA). In order to select a set of target regions, use the `--regions` (`-T`) option:
By default, octopus will call all regions specified in the reference index. In order to restrict calling to a subset of regions, either provide a list of zero-indexed regions in the format `chr:start-end` (`--regions`; `-T`), or a file containing a list of regions in either standard format or BED format (`--regions-file`; `-t`):

```shell
$ octopus -R hs37d5.fa -I NA12878.bam -T 1 2:30,000,000- 3:10,000,000-20,000,000
$ octopus -R hs37d5.fa -I NA12878.bam -t regions.bed
```

Or conversely a set of regions to *exclude* can be given with `--skip-regions` (`-K`):
Conversely a set of regions to *exclude* can be given explictely (`--skip-regions`;`-K`), or with a file (`--skip-regions-file`; `-k`):

```shell
$ octopus -R hs37d5.fa -I NA12878.bam -K 1 2:30,000,000- 3:10,000,000-20,000,000
$ octopus -R hs37d5.fa -I NA12878.bam -k skip-regions.bed
```

#### *Calling de novo mutations in a trio*
#### Calling de novo mutations in a trio

To call germline and de novo mutations in a trio, either specify both maternal (`--maternal-sample`; `-M`) and paternal (`--paternal-sample`; `-F`) samples:

```shell
$ octopus -R hs37d5.fa -I NA12878.bam NA12891.bam NA12892.bam -M NA12892 -F NA12891
```

The trio can also be specified with a PED file:
or provide a PED file which defines the trio:

```shell
$ octopus -R hs37d5.fa -I NA12878.bam NA12891.bam NA12892.bam --pedigree ceu_trio.ped
```

#### *Calling somatic mutations in tumours*
#### Calling somatic mutations in tumours

To call germline and somatic mutations in a paired tumour-normal sample, just specify which sample is the normal (`--normal-sample`; `-N`):

Expand All @@ -212,7 +228,7 @@ $ octopus -R hs37d5.fa -I normal.bam tumour.bam --normal-sample NORMAL
It is also possible to genotype multiple tumours from the same individual jointly:

```shell
$ octopus -R hs37d5.fa -I normal.bam tumourA.bam tumourB --normal-sample NORMAL
$ octopus -R hs37d5.fa -I normal.bam tumourA.bam tumourB.bam --normal-sample NORMAL
```

If a normal sample is not present the cancer calling model must be invoked explicitly:
Expand All @@ -221,9 +237,9 @@ If a normal sample is not present the cancer calling model must be invoked expli
$ octopus -R hs37d5.fa -I tumour1.bam tumour2.bam -C cancer
```

Note however, that without a normal sample, somatic mutation classification power is significantly reduced.
Be aware that without a normal sample, somatic mutation classification power is significantly reduced.

#### *Joint variant calling (in development)*
#### Joint variant calling (experimental)

Multiple samples from the same population, without pedigree information, can be called jointly:

Expand All @@ -233,17 +249,27 @@ $ octopus -R hs37d5.fa -I NA12878.bam NA12891.bam NA12892.bam

Joint calling samples may increase calling power, especially for low coverage sequencing.

#### *HLA genotyping*
#### Calling variants in mixed haploid samples (experimental)

If your sample contains an unknown mix of haploid clones (e.g. some bacteria or viral samples), use the `polyclone` calling model:

```shell
$ octopus -R H37Rv.fa -I mycobacterium_tuberculosis.bam -C polyclone
```

This model will automatically detect the number of subclones in your sample (up to the maximum given by `--max-clones`).

#### HLA genotyping

To call phased HLA genotypes, increase the default phase level:

```shell
$ octopus -R human.fa -I NA12878.bam -t hla-regions.txt -l aggressive
$ octopus -R hs37d5.fa -I NA12878.bam -t hla-regions.bed -l aggressive
```

#### *Multithreaded calling*
#### Multithreaded calling

Octopus has built in multithreading capacbilities, just add the `--threads` command:
Octopus has built in multithreading capabilities, just add the `--threads` command:

```shell
$ octopus -R hs37d5.fa -I NA12878.bam --threads
Expand All @@ -252,19 +278,35 @@ $ octopus -R hs37d5.fa -I NA12878.bam --threads
This will let octopus automatically decide how many threads to use, and is the recommended approach as octopus can dynamically juggle thread usage at an algorithm level. However, a strict upper limit on the number of threads can also be used:

```shell
$ octopus -R hs37d5.fa -I NA12878.bam --threads=4
$ octopus -R hs37d5.fa -I NA12878.bam --threads 4
```

#### *Fast calling*
#### Fast calling

By default, octopus is geared towards more accurate variant calling which requires the use of complex (slow) algorithms. However, to acheive faster runtimes (at the cost of decreased calling accuray) many of these features can be disabled. There are two helper commands that setup octopus for faster variant calling, `--fast` and `--very-fast`, e.g.:
By default, octopus is geared towards more accurate variant calling which requires the use of complex (slow) algorithms. However, to achieve faster runtimes (at the cost of decreased calling accuracy) many of these features can be disabled. There are two helper commands that setup octopus for faster variant calling, `--fast` and `--very-fast`, e.g.:

```shell
$ octopus -R hs37d5.fa -I NA12878.bam --fast
```

Note this does not turn on multithreading or increase buffer sizes.

#### Making evidence BAMs

Octopus can generate 'evidence' BAMs for single sample calling. To generate a single BAM file containing realigned reads supporting called variants use the `--bamout` option:

```shell
$ octopus -R hs37d5.fa -I NA12878.bam -o octopus.vcf --bamout octopus.bam
```

To generate split BAM files (one for each called haplotype) use the `--bamout` option, but specify only the file prefix:

```shell
$ octopus -R hs37d5.fa -I NA12878.bam -o octopus.vcf --bamout octopus
```

Octopus will generate BAM files (`octopus1.bam`, `octopus2.bam`, ...) for the number of haplotypes in the sample. Note that although each split BAM is haploid, the variants in each are only phased according to the phase sets called in the output VCF.

## Output format

Octopus outputs variants using a simple but rich VCF format (see [user documentation](https://github.com/luntergroup/octopus/blob/develop/doc/manuals/user/octopus-user-manual.pdf) for full details). For example, two overlapping deletions are represented like:
Expand Down
Binary file modified doc/manuals/user/octopus-user-manual.pdf
Binary file not shown.
Loading

0 comments on commit 18750d3

Please sign in to comment.