Releases: luntergroup/octopus
0.5.2-beta
This is a minor release with some bug fixes, and improvements to installation scripts.
Improvements
- Makes RFQUAL a FORMAT field rather than an INFO field, so each sample gets an RFQUAL. [81b75ea]
- Installation can now be to any location. Resolves #36. [18b36ea]
- Installation script can now be given htslib root location. Resolves #38. [93ba000]
- Installation script now tries both
cmake3
andcmake
. Resolves #37. [30a3ffb] - Installation script now properly downloads provided random forests. [3599999]
Bug fixes
- Fixes bug in htslib float field extraction that could corrupt FORMAT and INFO values. [0078c5a]
- Fixes bug in the de novo mutation model that could lead to segmentation faults. [7d945f8, ad996a2]
- Fixes bug that causes conflicting call exception due to calling variants in skipped regions. [01477d9]
- Fixes bug in de novo contamination measure that could cause segmentation faults. [f2e6610]
0.5.1-beta
This is a minor release that resolves a few issues in the first Beta release - v0.5.0-beta.
Improvements
- Adds support for allosomes in the trio calling model. [41b72b2]
- Moves the
RFQUAL
random forest score to theFORMAT
field, so there is now one score for each sample. [81b75ea] - Adds new measures:
RTB
,REB
,BMC
,BMF
. [f1000d4, 1a937f2] - Improves temp directory cleanup on failed runs. [95016f1]
- Makes random forest training a little easier by adding default measure lists to training scripts and by allowing the argument
forest
to the--training-annotations
option (renamed from--csr-train
). [519be06, 106e344] - Changes some of the UMI config settings to reduce runtimes (at minor expensive of accuracy). [36e4729]
Interface changes
- Renames
--csr-train
option to--training-annotations
. [a1f8c45] - Adds version numbers to provided random forests. [3599999]
- Renames the
RPB
measure toRSB
. [c52c6e8]
Bug fixes
- Resolves a libc++ bug where subnormal
double
s are not parsed properly, causing errors when using random forest filtering. [dc13754] - Fixes a possible segmentation fault when using the
MQD
measure. [45b9b74] - Fixes a VCF reading bug that could mangle
INFO
andFORMAT
fields with multiple values. [0078c5a]
0.5.0-beta
This is the first beta release as most of the core features are reasonably mature. There have been various stability and runtime improvements, in addition to improvements to the core algorithm - including a completely new indel mutation model. Once again, the cancer calling model has received most attention, particularly for high depth ultra-low VAF tumour-only calling (e.g. UMI).
General
- Overhaul of the indel mutation model which controls priors on germline, somatic, and de novo mutations. Gap open and extensions conditional on local repeat context and current gap length are modelled. [bd0eb24, 20f5d9f]
- A brand new candidate variant generator! Named RepeatScanner, this generator looks for likely misaligned SNV runs in microsatellites and proposes indels. This can result in more biologically realistic calls in these regions. This generator is controlled with the
--repeat-candidate-generator
command line option. [2856c2e , 2856c2e] - Evidence BAMs for multi-sample input, including 'split' evidence BAMs. [face5fb, e56641c]
- The way
QUAL
is calculated in the cancer and trio models has been improved. PreviouslyQUAL
was the posterior probability the called alt allele segregated and is classified correctly. This could lead to lowQUAL
scores if the classification was uncertain (e.g. in tumour-only samples).QUAL
is now simply the posterior probability the allele segregates. There is also a new annotation for all cancer caller calls, andDENOVO
trio calls,PP
, that is equivalent to the oldQUAL
. [905c96b, 3b28e9f, 0d1537b] - Candidate variant generators are now more sensitive to very low frequency variation (<1% VAF). [d3e3631]
SOMATIC
have a new annotation:MAP_VAF
which reports theMaximum a posteriori VAF estimate.- New measures to use for threshold and random forest filtering. [11ff14f]
- Complete refactor of the core cancer caller genotype models results in some runtime improvements. [d3e5a5a]
- Better Variational Bayes seed generation for cancer genotypes, especially good news for lower frequency mutations. [2fadf78]
- Improved somatic model fitting for high ploidy somatic genotypes in cancer caller. [2d7573c]
- Improved use of indexing in the individual caller results in ~5% speedup. [b6bba8a, 16a3cc5, 9c951d2]
- Better identification of messy regions that slow down calling. [5326835 , 8208a20]
- The assembler now considers observed read strands and reduce the score of bubbles with high strand bias. [50da804]
- Filtering measures can now be parameterised by user input. [e1ab330]
- The way some measures consider ambiguous reads has been improved which can prevent some biases previously observed. [7e2f635]
- Adds support for calling chromosome Y in trios. [41b72b2]
- Adds a "data profiler" that can be used to build a profile of polymorphisms and errors present in the data. Currently this only profiles indels. This feature is currently experimental and is primarily intended to be used to improve indel error models. [99ad1e9]
Bug fixes
- Fixes a bug that could lead to segmentation faults during haplotype generation. [1ecd74e]
- Fixes a problem reading lists of floats from VCF files that could result in garbage output (e.g. for
VAF_CR
) [e361f50]. - Fix GCC 8 warning which caused compile error. [58b51fd, 3733b09]
- Fixes some instances of compiler based non-determinism that could result in different results between compilers. [d018193, e66169e]
Interface changes
- Adds command line option
--max-vb-seeds
which controls the maximum number of seeds the Variational Bayes based genotype model algorithms can use. [95c66a2] - Adds
--split-bamout
for split realigned BAMs. Split BAMs are no longer requested by specifying a prefix to--bamout
. [34d8a89] - The measure
SC
has been renamed toNC
(Normal Contamination). [23497c3]
-- Adds--mask-tails
for unconditionally masking bases of all read tails. [acfddaf] - Adds
--tumour-germline-concentration
which may be used to control shape of prior distribution on haplotype mixture frequency of tumour samples. Only really relevant to high depth tumour-only calling. [9f83ca6] - Renames
--snv-denovo-mutation-rate
to--denovo-snv-mutation-rate
and--indel-denovo-mutation-rate
to--denovo-indel-mutation-rate
. [4b9d95f] - Adds
--repeat-candidate-generator
to control new repeat candidate generator. [2856c2e]
Miscellaneous
- There is now a
configs
directory in the main project directory that contains pre-written configs for calling certain types of data. [9da0364]
0.4.1-alpha
This is a bug fix release that fixes a minor bug that crept into v0.4.0-alpha.
Bug fixes
- Fixes a bug in v0.4.0-alpha where germline calls may be hard filtered when using threshold filtering.
0.4.0-alpha
This is a major release with important new features, enhancements, and performance improvements.
New features
- New polyclone calling model for bacterial and viral data.
- New population calling model with Hardy-Weinberg priors.
- Random forest filtering for germline and somatic variants using ranger.
- Generate an 'evidence' BAM for single sample calling with the
--bamout
option. See the wiki page for details.
Calling improvements
- The cancer caller can now model more than one somatic haplotype which improves calling sensitivity, and also allows somatic phasing. See cancer calling model wiki for more details.
- Optimisation of the cancer model improves sensitivity for low frequency mutations.
- New unified indel mutation model used for germline, de-novo, and somatic indel calling.
- New filter Measures. See wiki for full list.
- Tumour-only calling now much faster and more accurate.
- Uses variant prior model to deduplicate haplotypes for all models, resulting in more biologically realistic calls.
DENOVO
andSOMATIC
calls now get different filtering treatment to regular germline variants using threshold filters.
Interface changes
- Added
--forest-file
and--somatic-forest-file
for random forest filtering. - Added
--somatics-only
to report onlySOMATIC
variants. - Added
--denovos-only
to report onlyDENOVO
variants. - Added
--max-somatic-haplotypes
which limits the number of somatic haplotypes that may be used by thecancer
calling model. --consider-reads-with-unmapped-segments
-->--no-reads-with-unmapped-segments
and--consider-reads-with-distant-segments
-->--no-reads-with-distant-segments
. These filters are now off my default.--max-cancer-genotypes
removed and replaced with--max-genotypes
, which is also used by thepolyclone
calling model.- Added
--max-clones
option for specifying the maximum number of clones for thepolyclone
calling model. - Added
--somatic-filter-expression
,--denovo-filter-expression
, and--refcall-filter-expression
which may be used for hard filtering 'DENOVO' andSOMATIC
calls.
0.3.3-alpha
This version brings new features, in addition to significant calling and runtime improvements.
New features
- CSR filtering can be run on a user supplied octopus VCF file, without running calling (
--filter-vcf
command line option). - Micro-inversions and complex rearrangements are callable.
Calling improvements
- Better handling of variants in tandem repeat regions, in particular, many cases that would previously have been called as a series of SNV's, are now called as an insertion-deletion pair, which is more biologically plausible.
- Improved the SNV error model to stop some true heterozygous SNV's being called as homozygous.
Runtime improvements
- CSR filtering is fully parallelised. Like for calling, this is activated with the
--threads
command. This resolves #13.
Bug fixes
- Various fixes to the way haplotypes are reconstructed from VCF, which lead to some edge cases being misclassified.
Interface changes
- The helper Python install script
install.py
is now supplied with both a C++ and C compiler with thecxx_compiler
andc_compiler
commands respectively. - Supplementary alignments are now filtered by default (
--no-supplementary-alignments
changes to--allow-supplementary-alignments
). - Secondary alignments are now filtered by default (
--no-secondary-alignments
changes to--allow-secondary-alignments
).
Other changes
- htslib is now linked dynamically by default, which means its requirements do not need to be explicitly linked also. This resolves #16. Be sure to clean any CMake caches before rebuilding (
--clean
with Python install script). .vcf.gz
index files are now in the.tbi
format, rather than.csi
.
0.3.2-alpha
This version brings bug fixes and some minor performance improvements.
Bug fixes
- Fixes issue #11 where octopus hangs after calling variants.
- Fixes issue #17 where contig names containing a colon could not be parsed.
Performance improvements
- Gap open penalties are now more consistent tandem repeats which can improve calling performance in some cases.
- Decreased the minimum probability cap for de novo mutation model which seems to result in more sensitive de novo and somatic mutation calls.
Interface changes
- Somatic SNV and INDEL mutation rates are now specified separately via the command line.
0.3.1-alpha
This release contains some runtime performance improvements, particularly for the tumour calling model. It also updates the requirements for GCC, CMake, and Boost.
Requirement changes
- Updates CMake requirement to 3.9 so can use IPO checks.
- Updates Boost requirement to 1.65 for bug fixes and better program option formatting.
- Updates GCC requirement to 6.3 to avoid bug in 6.2.
Performance Improvements
- Significantly improves runtime performance of tumour calling model.
- Improves masking of noisy regions which can slow down calling.
- Slightly improves CSR runtime performance.
Other changes
- Fixes various warnings from new Clang and GCC compilers.
- Can now build with compiler sanitizer flags.
- Adds a Dockerfile.
0.3-alpha
This is a major release that contains significant new features and improvements.
New features
- Variant filtering: Octopus now has simple threshold based filtering which is turned on by default. This can dramatically reduce the false positive rate in some datasets (e.g. Platinum genomes).
- The population model now uses an independence-based genotype model. Although this doesn't offer true joint calling, it at-least offers consistent output until such time as a proper model is implemented.
- Somatic mutation calling is now significantly faster and more accurate due to model optimisation.
Bug fixes
- Fixed a bug with haplotype filtering that could cause haplotypes not to be filtered, and also result in inconsistent results between runs.
Other changes
- VCF records now include AC and AN INFO fields.
- Added an official logo!
- Protect called haplotypes from filtering when using holdouts.
- Octopus will now always emit a call if the variant posterior is above the given threshold, even if the homozygous reference genotype is MAP.
- The max QUAL is now 10000.
0.2.1-alpha
This release includes a new de novo mutation model that improves trio calling.
New features
- A new de novo mutation model that includes context dependent indel gap open and extension penalties, calculates using an exponential model. There are now two options that parametrise the model;
snv-denovo-mutation-rate
andindel-denovo-mutation-rate
. Gap open and extension penalties are weighted based on context.
Bug fixes
- Fixes a bug that could prevent a legacy VCF being made.
- Corrects a region difference method that sometimes resulted in incorrect 'skip region' deduction, which could lead to an exception being thrown.
- Fixes a bug that resulted in an incorrect trio model posterior probability.
- Fixes some numerical overflow/underflow bugs that resulted in undefined behaviour.
Other changes
- Increases
max-joint-genotypes
to 1,000,00.