Skip to content

0.5.0-beta

Pre-release
Pre-release
Compare
Choose a tag to compare
@dancooke dancooke released this 17 Sep 18:24
· 1501 commits to master since this release

This is the first beta release as most of the core features are reasonably mature. There have been various stability and runtime improvements, in addition to improvements to the core algorithm - including a completely new indel mutation model. Once again, the cancer calling model has received most attention, particularly for high depth ultra-low VAF tumour-only calling (e.g. UMI).

General

  • Overhaul of the indel mutation model which controls priors on germline, somatic, and de novo mutations. Gap open and extensions conditional on local repeat context and current gap length are modelled. [bd0eb24, 20f5d9f]
  • A brand new candidate variant generator! Named RepeatScanner, this generator looks for likely misaligned SNV runs in microsatellites and proposes indels. This can result in more biologically realistic calls in these regions. This generator is controlled with the --repeat-candidate-generator command line option. [2856c2e , 2856c2e]
  • Evidence BAMs for multi-sample input, including 'split' evidence BAMs. [face5fb, e56641c]
  • The way QUAL is calculated in the cancer and trio models has been improved. Previously QUAL was the posterior probability the called alt allele segregated and is classified correctly. This could lead to low QUAL scores if the classification was uncertain (e.g. in tumour-only samples). QUAL is now simply the posterior probability the allele segregates. There is also a new annotation for all cancer caller calls, and DENOVO trio calls, PP, that is equivalent to the old QUAL. [905c96b, 3b28e9f, 0d1537b]
  • Candidate variant generators are now more sensitive to very low frequency variation (<1% VAF). [d3e3631]
  • SOMATIC have a new annotation: MAP_VAF which reports theMaximum a posteriori VAF estimate.
  • New measures to use for threshold and random forest filtering. [11ff14f]
  • Complete refactor of the core cancer caller genotype models results in some runtime improvements. [d3e5a5a]
  • Better Variational Bayes seed generation for cancer genotypes, especially good news for lower frequency mutations. [2fadf78]
  • Improved somatic model fitting for high ploidy somatic genotypes in cancer caller. [2d7573c]
  • Improved use of indexing in the individual caller results in ~5% speedup. [b6bba8a, 16a3cc5, 9c951d2]
  • Better identification of messy regions that slow down calling. [5326835 , 8208a20]
  • The assembler now considers observed read strands and reduce the score of bubbles with high strand bias. [50da804]
  • Filtering measures can now be parameterised by user input. [e1ab330]
  • The way some measures consider ambiguous reads has been improved which can prevent some biases previously observed. [7e2f635]
  • Adds support for calling chromosome Y in trios. [41b72b2]
  • Adds a "data profiler" that can be used to build a profile of polymorphisms and errors present in the data. Currently this only profiles indels. This feature is currently experimental and is primarily intended to be used to improve indel error models. [99ad1e9]

Bug fixes

  • Fixes a bug that could lead to segmentation faults during haplotype generation. [1ecd74e]
  • Fixes a problem reading lists of floats from VCF files that could result in garbage output (e.g. for VAF_CR) [e361f50].
  • Fix GCC 8 warning which caused compile error. [58b51fd, 3733b09]
  • Fixes some instances of compiler based non-determinism that could result in different results between compilers. [d018193, e66169e]

Interface changes

  • Adds command line option --max-vb-seeds which controls the maximum number of seeds the Variational Bayes based genotype model algorithms can use. [95c66a2]
  • Adds --split-bamout for split realigned BAMs. Split BAMs are no longer requested by specifying a prefix to --bamout. [34d8a89]
  • The measure SC has been renamed to NC (Normal Contamination). [23497c3]
    -- Adds --mask-tails for unconditionally masking bases of all read tails. [acfddaf]
  • Adds --tumour-germline-concentration which may be used to control shape of prior distribution on haplotype mixture frequency of tumour samples. Only really relevant to high depth tumour-only calling. [9f83ca6]
  • Renames --snv-denovo-mutation-rate to --denovo-snv-mutation-rate and --indel-denovo-mutation-rate to --denovo-indel-mutation-rate. [4b9d95f]
  • Adds --repeat-candidate-generator to control new repeat candidate generator. [2856c2e]

Miscellaneous

  • There is now a configs directory in the main project directory that contains pre-written configs for calling certain types of data. [9da0364]