Skip to content

Commit

Permalink
Merge branch 'release/0.3-alpha'
Browse files Browse the repository at this point in the history
  • Loading branch information
Daniel Cooke committed Nov 8, 2017
2 parents e7e779d + a5e70df commit 98ea10a
Show file tree
Hide file tree
Showing 395 changed files with 5,594 additions and 1,965 deletions.
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
MIT License

Copyright (c) 2016 Daniel Cooke
Copyright (c) 2017 Daniel Cooke

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
35 changes: 24 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ Octopus is a mapping-based variant caller that implements several calling models
* CMake 3.5 or greater
* Optional:
* Python3 or greater
Warning: GCC 6.1.1 and below have bugs which affect octopus, the code may compile, but do not trust the results. GCC 6.2 should be safe. Clang 3.8 has been tested. Visual Studio likely won't compile as it is not C++14 feature complete.

**Warning**: GCC 6.2.1 and below have bugs which affect octopus, the code may compile, but do not trust the results. GCC 6.3 and above should be safe. Clang 3.8 has been tested. Visual Studio likely won't compile as it is not C++14 feature complete.

#### *Obtaining requirements on OS X*

Expand All @@ -42,7 +42,7 @@ Note if you already have any of these packages installed via Homebrew on your sy

#### *Obtaining requirements on Ubuntu Xenial*

On Ubuntu, Clang 3.8 is recommended as GCC 6.2 has a [bug](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77550) which slows down octopus. To install the requirements use:
To install the requirements (using Clang) enter:

```shell
$ sudo apt-get update && sudo apt-get upgrade
Expand Down Expand Up @@ -112,8 +112,7 @@ If Python3 isn't available, the binaries can be installed manually with [CMake](
```shell
$ git clone https://github.com/luntergroup/octopus.git
$ cd octopus/build
$ cmake ..
$ make install
$ cmake .. && make install
```

By default this installs to the `/bin` directory where octopus was installed. To install to root (e.g. `/usr/local/bin`) use the `-D` option:
Expand Down Expand Up @@ -193,14 +192,19 @@ $ octopus -R hs37d5.fa -I NA12878.bam -K 1 2:30,000,000- 3:10,000,000-20,000,000

#### *Calling de novo mutations in a trio*

To call germline and de novo mutations in a trio, either specify both maternal (`--maternal-sample`; `-M`) and paternal (`--paternal-sample`; `-F`) samples, or supply a pedigree file which contains the trio (`--pedigree`):
To call germline and de novo mutations in a trio, either specify both maternal (`--maternal-sample`; `-M`) and paternal (`--paternal-sample`; `-F`) samples:

```shell
$ octopus -R hs37d5.fa -I NA12878.bam NA12891.bam NA12892.bam -M NA12892 -F NA12891
```

The trio can also be specified with a PED file:

```shell
$ octopus -R hs37d5.fa -I NA12878.bam NA12891.bam NA12892.bam --pedigree ceu_trio.ped
```

#### *Calling somatic mutations in tumours (WIP)*
#### *Calling somatic mutations in tumours*

To call germline and somatic mutations in a paired tumour-normal sample, just specify which sample is the normal (`--normal-sample`; `-N`):

Expand All @@ -222,14 +226,23 @@ $ octopus -R hs37d5.fa -I tumour1.bam tumour2.bam -C cancer

Note however, that without a normal sample, somatic mutation classification power is significantly reduced.

#### *Joint variant calling (WIP)*
#### *Joint variant calling (in development)*

Multiple samples from the same population, without pedigree information, can be called jointly:

```shell
$ octopus -R hs37d5.fa -I NA12878.bam NA12891.bam NA12892.bam
```
Joint calling samples can increases calling power, especially for low coverage sequencing.

Joint calling samples may increase calling power, especially for low coverage sequencing.

#### *HLA genotyping*

To call phased HLA genotypes, increase the default phase level:

```shell
$ octopus -R human.fa -I NA12878.bam -t hla-regions.txt -l aggressive
```

#### *Multithreaded calling*

Expand Down Expand Up @@ -257,11 +270,11 @@ Note this does not turn on multithreading or increase buffer sizes.

## Documentation

Complete user and developer documentation is available in the doc directory.
Complete [user](https://github.com/luntergroup/octopus/blob/develop/doc/manuals/user/octopus-user-manual.pdf) and [developer](https://github.com/luntergroup/octopus/blob/develop/doc/manuals/dev/octopus-dev-manual.pdf) documentation is available in the doc directory.

## Support

Please report any bugs or feature requests to the [octopus issue tracker](https://github.com/dancooke/octopus/issues).
Please report any bugs or feature requests to the [octopus issue tracker](https://github.com/luntergroup/octopus/issues).

## Contributing

Expand Down
10 changes: 5 additions & 5 deletions TODO.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# TODO

Here is a non-exhaustive list of things that need doing. In general, the individual caller is now pretty solid (other than some runtime performance issues). Everything else needs work, in particular the cancer caller which suffers from significant runtime issues.
This is a non-exhaustive list of things that need doing.

## Features

- Variant filtering.
- Population joint calling.
- Reference callings.

## Calling accuracy improvements
Expand All @@ -22,7 +22,7 @@ Here is a non-exhaustive list of things that need doing. In general, the individ
- Variational Bayes model needs rewriting as current implementation is just a prototype. In general the CancerCaller is very slow and needs improving.
- In multithreaded mode, if we have too many variants buffered, we should write them to another temporary file.

## Cosmetic
## Refactoring

- VcfRecordFactory is horrible and needs refactoring. The entire design are the Call family needs looking at.

Expand All @@ -35,5 +35,5 @@ Here is a non-exhaustive list of things that need doing. In general, the individ

## Testing

- In dire need of proper unit testing!
- Add regression testing
- More unit testing!
- Add regression testing,
Binary file modified doc/manuals/user/octopus-user-manual.pdf
Binary file not shown.
Binary file removed doc/paper/octopus-paper.pdf
Binary file not shown.
50 changes: 0 additions & 50 deletions doc/paper/octopus-paper.tex

This file was deleted.

Empty file.
2 changes: 1 addition & 1 deletion lib/bioio.hpp
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
/* MIT License
Copyright (c) 2016 Daniel Cooke
Copyright (c) 2017 Daniel Cooke
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
2 changes: 1 addition & 1 deletion lib/tandem/tandem.cpp
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
/* MIT License
Copyright (c) 2016 Daniel Cooke
Copyright (c) 2017 Daniel Cooke
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
2 changes: 1 addition & 1 deletion lib/tandem/tandem.hpp
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
/* MIT License
Copyright (c) 2016 Daniel Cooke
Copyright (c) 2017 Daniel Cooke
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
Binary file modified logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
54 changes: 45 additions & 9 deletions src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,8 @@ set(READPIPE_SOURCES
readpipe/read_pipe_fwd.hpp
readpipe/read_pipe.hpp
readpipe/read_pipe.cpp
readpipe/buffered_read_pipe.hpp
readpipe/buffered_read_pipe.cpp

readpipe/downsampling/downsampler.hpp
readpipe/downsampling/downsampler.cpp
Expand Down Expand Up @@ -155,7 +157,6 @@ set(UTILS_SOURCES
utils/timing.hpp
utils/type_tricks.hpp
utils/coverage_tracker.hpp
utils/coverage_tracker.cpp
utils/read_size_estimator.hpp
utils/read_size_estimator.cpp
utils/kmer_mapper.hpp
Expand All @@ -165,6 +166,9 @@ set(UTILS_SOURCES
utils/emplace_iterator.hpp
utils/repeat_finder.hpp
utils/repeat_finder.cpp
utils/genotype_reader.hpp
utils/genotype_reader.cpp
utils/beta_distribution.hpp
)

set(CORE_SOURCES
Expand Down Expand Up @@ -204,25 +208,55 @@ set(CORE_SOURCES
core/tools/vcf_header_factory.cpp
core/tools/vcf_record_factory.hpp
core/tools/vcf_record_factory.cpp

core/csr/facets/read_assignment.hpp
core/csr/facets/read_assignment.cpp
core/tools/read_assigner.hpp
core/tools/read_assigner.cpp

core/csr/facets/facet.hpp
core/csr/facets/facet.cpp
core/csr/facets/overlapping_reads.hpp
core/csr/facets/overlapping_reads.cpp
core/csr/facets/read_assignments.hpp
core/csr/facets/read_assignments.cpp
core/csr/facets/facet_factory.hpp
core/csr/facets/facet_factory.cpp

core/csr/filters/supervised_variant_call_filter.hpp
core/csr/filters/supervised_variant_call_filter.cpp
core/csr/filters/threshold_filter.hpp
core/csr/filters/threshold_filter.cpp
core/csr/filters/variant_call_filter.hpp
core/csr/filters/variant_call_filter.cpp
core/csr/filters/variant_call_filter_factory.hpp
core/csr/filters/variant_call_filter_factory.cpp
core/csr/filters/threshold_filter_factory.hpp
core/csr/filters/threshold_filter_factory.cpp

core/csr/measures/measure.hpp
core/csr/measures/qual.hpp
core/csr/measures/qual.cpp
core/csr/measures/depth.hpp
core/csr/measures/depth.cpp
core/csr/measures/quality_by_depth.hpp
core/csr/measures/quality_by_depth.cpp

core/csr/utils/genotype_reader.hpp
core/csr/utils/genotype_reader.cpp
core/csr/utils/variant_call_filter_factory.hpp
core/csr/utils/variant_call_filter_factory.cpp
core/csr/measures/mapping_quality_zero_count.hpp
core/csr/measures/mapping_quality_zero_count.cpp
core/csr/measures/mean_mapping_quality.hpp
core/csr/measures/mean_mapping_quality.cpp
core/csr/measures/model_posterior.hpp
core/csr/measures/model_posterior.cpp
core/csr/measures/allele_frequency.hpp
core/csr/measures/allele_frequency.cpp
core/csr/measures/strand_bias.hpp
core/csr/measures/strand_bias.cpp
core/csr/measures/mapping_quality_divergence.hpp
core/csr/measures/mapping_quality_divergence.cpp
core/csr/measures/is_denovo.hpp
core/csr/measures/is_denovo.cpp
core/csr/measures/is_somatic.hpp
core/csr/measures/is_somatic.cpp
core/csr/measures/measures_fwd.hpp
core/csr/measures/measure_factory.hpp
core/csr/measures/measure_factory.cpp

core/models/haplotype_likelihood_cache.hpp
core/models/haplotype_likelihood_cache.cpp
Expand All @@ -236,6 +270,8 @@ set(CORE_SOURCES
core/models/genotype/germline_likelihood_model.cpp
core/models/genotype/individual_model.hpp
core/models/genotype/individual_model.cpp
core/models/genotype/independent_population_model.hpp
core/models/genotype/independent_population_model.cpp
core/models/genotype/population_model.hpp
core/models/genotype/population_model.cpp
core/models/genotype/tumour_model.hpp
Expand Down
38 changes: 26 additions & 12 deletions src/basics/aligned_read.cpp
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
// Copyright (c) 2016 Daniel Cooke
// Copyright (c) 2017 Daniel Cooke
// Use of this source code is governed by the MIT license that can be found in the LICENSE file.

#include "aligned_read.hpp"
Expand Down Expand Up @@ -85,6 +85,15 @@ const CigarString& AlignedRead::cigar() const noexcept
return cigar_;
}

AlignedRead::Direction AlignedRead::direction() const noexcept
{
if (is_marked_reverse_mapped()) {
return Direction::reverse;
} else {
return Direction::forward;
}
}

bool AlignedRead::has_other_segment() const noexcept
{
return static_cast<bool>(next_segment_);
Expand Down Expand Up @@ -315,16 +324,16 @@ AlignedRead copy(const AlignedRead& read, const GenomicRegion& region)
};
}

bool operator==(const AlignedRead& lhs, const AlignedRead& rhs)
bool operator==(const AlignedRead& lhs, const AlignedRead& rhs) noexcept
{
return lhs.mapping_quality() == rhs.mapping_quality()
&& lhs.mapped_region() == rhs.mapped_region()
&& lhs.cigar() == rhs.cigar()
&& lhs.sequence() == rhs.sequence()
&& lhs.base_qualities() == rhs.base_qualities();
&& lhs.base_qualities() == rhs.base_qualities();
}

bool operator<(const AlignedRead& lhs, const AlignedRead& rhs)
bool operator<(const AlignedRead& lhs, const AlignedRead& rhs) noexcept
{
if (lhs.mapped_region() == rhs.mapped_region()) {
if (lhs.mapping_quality() == rhs.mapping_quality()) {
Expand All @@ -345,23 +354,28 @@ bool operator<(const AlignedRead& lhs, const AlignedRead& rhs)
}
}

bool are_other_segments_duplicates(const AlignedRead &lhs, const AlignedRead &rhs)
bool next_segments_are_duplicates(const AlignedRead& lhs, const AlignedRead& rhs) noexcept
{
if (lhs.has_other_segment() && rhs.has_other_segment()) {
return lhs.next_segment() == rhs.next_segment();
if (lhs.has_other_segment()) {
if (rhs.has_other_segment()) {
return lhs.next_segment() == rhs.next_segment();
} else {
return false;
}
} else {
return !rhs.has_other_segment();
}
return false;
}

bool IsDuplicate::operator()(const AlignedRead &lhs, const AlignedRead &rhs) const
bool IsDuplicate::operator()(const AlignedRead& lhs, const AlignedRead& rhs) const noexcept
{
return lhs.mapped_region() == rhs.mapped_region()
&& lhs.cigar() == rhs.cigar()
&& lhs.flags().reverse_mapped == rhs.flags().reverse_mapped
&& are_other_segments_duplicates(lhs, rhs);
&& lhs.is_marked_reverse_mapped() == rhs.is_marked_reverse_mapped()
&& next_segments_are_duplicates(lhs, rhs);
}

bool operator==(const AlignedRead::Segment& lhs, const AlignedRead::Segment& rhs)
bool operator==(const AlignedRead::Segment& lhs, const AlignedRead::Segment& rhs) noexcept
{
return lhs.contig_name() == rhs.contig_name()
&& lhs.begin() == rhs.begin()
Expand Down
Loading

0 comments on commit 98ea10a

Please sign in to comment.