Releases: luntergroup/octopus
Octopus 0.7.4
Octopus 0.7.3
This release contains several minor improvements:
- Replaces the old random forest training procedure with a Snakemake version. [e0c4922]
- All annotations can now requested even if they are not active. [9bb584c]
- Annotations will be default no longer be aggregated when
--disable-call-filtering
is used. To aggregate annotations for forest training, the--aggregate-annotations
option is added. [42f4248] - Big runtime improvement to the
cell
calling model. [bc062d2] - Change the default
--max-genotype-combinations
to100,000
fortrio
andpopulation
calling. This improves runtime considerably but has little impact on accuracy. [18fcbcb] - Trains a new germline random forest using much less training data overall but more trio data.
Octopus 0.7.2
This is a minor bug fix release:
- Fixes a segmentation fault in the cancer caller caused by adjacent phase blocks with different ploidies. [9bc21a3]
- Default to UTC time when no tz database found. [3dbd8cc]
- Prints annotations when
--annotations
specified with--help
. [366fe04] - Fixes a bug causing some reads to be dropped when filtering long. haplotype regions. [8f2cc87]
- Update the link in the README to the Nature Biotechnology paper! [b8baf13]
Octopus 0.7.1
This is a minor bug fix release:
- Fixes underreporting of de novo mutations in trio mode [7d195de].
- Improves QUAL precision for trio calls [b68c7cf].
- Resolves some issues with read counting (e.g.
AD
) on*
alleles [e0023ad]. - Fixes underreporting of more than one somatic haplotype in cancer mode [0c7d06b].
- Improves coalescent model to allow 2 indel heterozygosity parameters along a haplotype [58db934].
- Adds timestamp to VCF output [36ca0b9].
- Adds
--architecture
option toinstall.py
that sets compilermarch
option [87faea9]. - Default minimum mapping quality filter set to 5.
Octopus 0.7.0
This is a major release since v0.6.3-beta
and is the first non-beta
release. Highlights include:
- The pair HMM used for the core haplotype likelihood model has been completely re-written to support AVX2 and AVX-512 instruction sets. This can result in some nice performance improvements on machines supporting these instructions. Also, the HMM now supports variable band-widths and 32-bit integer scores, which is necessary to evaluate long reads.
- Evidence BAMs are now annotated with supporting haplotype(s) and other information. Automatic 'splitting' by haplotype is gone but there is a [script] provided to do this.
- Octopus is now paired and linked read aware! Reads are assumed paired by default, but can be assumed unpaired or linked with the
--read-linkage
option. This improves accuracy and phasing for most analysis. - Random forests now store the annotations used for training as meta information in the forest file, allowing different annotations to be used for different forests. Note that this change makes previous forest versions incompatible with this version, it also means that a modified ranger must be used for training (the main ranger package does not store variable names in the meta info).
- Allele-level annotations (e.g.
AD
) are now supported; they can be requested with the--annotations
option. - The phasing algorithm has been completely re-written to improve accuracy and to allow discontiguous phase sets, which can frequently occur in some analysis (e.g. linked reads, or somatic phasing).
- Calling from PacBio CCS reads is now supported - although improvements are still needed, especially regarding runtime. See the PacBio CCS config.
- The haplotype generator now supports 'backtracking' - where a block of partially resolved haplotypes is buffered, and then restored when downstream haplotypes have also been partially resolved. This can lead to long haplotypes much faster than keeping all haplotypes in the tree simultaneously. Backtracking is turned off by default, but can be. enabled by using
--backtrack-level
option. - Mixing of distinct sample ploidies is now supported by the population calling model.
- Overflows on
QUAL
andGQ
have been reduced allowing for much greater ranges on these statistics. - The use of
*
ALT
allele has been brought inline with the updated VCF v4.3 specification. The--legacy
option has therefore been removed. - New
RFGQ_ALL
INFO
measure for random forest filtered runs - the empirical probability (Phred) of all genotypes being correct (derived from eachFORMAT
RFGQ
). Use this for filtering tumour-normal calls etc. - Handling of ALT supplementary alignments (for GRCh38 etc) has been improved, resulting in better accuracy.
- Polyploid calling much faster, especially when the
--max-genotypes
option is used (recommended for anything over triploid). - The local re-assembler now automatically considers the average region depth when evaluating bubbles, resulting in fewer spurious candidate variants.
- The local re-assembler no longer allows cyclic graphs by default, resulting in far fewer spurious candidates with very little loss in sensitivity. Cyclic graphs can be re-enabled with the
--allow-cycles
option. - Haplotypes (i.e. phased
GT
entires) are now reported in a consistent manner - always lexicographical (w.r.t the implied haplotype). This breaks the previous rule that somatic haplotypes always appeared after germline ones - somatic haplotypes are now identified with theHSS
FORMAT
annotation. - The way genotypes are represented has been completely re-written, resulting in some nice runtime performance improvements for all calling models.
- The way filtering measures are calculated has been re-written, resulting in a nice runtime performance improvement for filtering.
- The way Octopus identifies 'uncallable' regions that tend to slow down analysis has been much improved, resulting in much better runtimes.
- Automatic dependency installation in the installation script has been much improved, and is now the recommend way to install Octopus on all operating systems.
- Many bug fixes.
0.6.3-beta
This release reduces runtime in the cancer
and polyclone
calling models by 20-25%; fixes a bug in the read deduplication algorithm, resulting in fewer false positive calls in PCR data (particularly for somatic calling), adds new read pre-processing options designed to mitigate systematic artefacts in 10X Genomics sequencing, and adds a new way to metric (RFQUAL_ALL
) to filter somatic variant calls.
New features / interface changes
- Adds command line options
--mask-inverted-soft-clipping
and--mask-3prime-shifted-soft-clipped-heads
for masking 10X Genomics sequencing artefacts. [0b8fb93, 6566fb2]
Improvements
- Reduces runtime in the
VariationalBayesMixtureMixtureModel
used in thecancer
andpolyclone
calling models by ~20-25% [d9cbcec] - Switches multi-precision floating point arithmetic in the
cancer
calling model to use GMP library, resulting in a small speedup. This change adds a dependency to GMP. [e59be9d]
Bug fixes
0.6.2-beta
This is a minor bug fix release.
Bug fixes
0.6.1-beta
This is a minor release that fixes some bugs, compilation issues, and adds better binary version logging.
Changes
- The git branch and commit, are some system information are now logged during compilation. This information is available with the
--version
command. [242dd00, 36a6a82, c6c397d] - Adds measure
ADP
for assigned sequence depth (i.e. reads assigned to a unique called allele). [702109e] - Adds measures
ADP
andVL
to default random forest measures. [a035953] - Adds support for gzipped region files (for options
--regions-file
and--skip-regions-file
) [ec41af4] - Reads that cannot be assigned to a unique haplotype are assigned randomly to any of the supporting haplotypes for bam realignment (rather than always assigning to one of them). [cb3faf9]
Bug fixes
0.6.0-beta
This release improves calling accuracy, includes more flexible error modelling, and adds annotations to filtered VCF and realigned BAM files.
Interface changes
- The
--training-annotations
option is replaced with--annotations
, with has slightly different behaviour (see below). - The
--split-bamout
option is removed as--bamout
realignments now include tags. - Adds the option
--full-bamout
. [1147e8f] - Adds the option
--refcall-block-merge-threshold
for controlling recall blocks. - Renames
--extract-filtered-source-candidates
to--use-filtered-source-candidates
. [6972ffa]
Improvements
- Indel error models now include variable gap extension penalties and account for tetra-nucleotide tandem repeats. [8f40fc3]
- More built-in sequence error models to choose from, and custom error models (see wiki). [8f40fc3]
- Annotations can now be requested for filtered VCF files using the new
--annotations
option. [c75cbac] - Reference calling now outputs calls in adaptive blocks using the new
--refcall-block-merge-threshold
option. [9127cf3] - Better handling of temporary BCF files in multithreaded mode helps prevent system errors due to too many open files (addresses issue #52). [42fa364]
- Adds annotations to realigned evidence BAMs (see wiki). [c047e96]
0.5.3-beta
This is a minor release containing some bug fixes and an improved installation method.
Interface changes
- Renames option
--download
to--download-forests
in Python installation script. [166b6be] - Adds command line option
--temp-directory-prefix
for setting name of temporary directory. [69ea9e7] - Adds
cell
caller prototype (undocumented). [9f496d3]
Improvements
- Assembler now adjusted support threshold depending on depth, which should prevents too many false candidates from high depth samples. [d582cbc, 91930cb]
- Adds `--install-dependencies' to the Python installation script that results in all dependencies being installed locally. [81c6535, 5ac6958]
- Allows candidates only seen on one strand if there are only reads seen in that direction overlapping the region. Addresses #45. [44c5c22]
Bug fixes
- Resolves exception when merging temporary files for contigs containing
:
(e.g. HLA-A*01:01:01:01). Resolves #44. [https://github.com/luntergroup/octopus/commit/37a3329239518a07b2b9ca62e28a7570d9773667] - Stops output of IUPAC ambiguity symbols, which are not permitted by VCF specification. Resolves #46. [da05513]
- Should prevent exception being thrown during filtering caused by short haplotype for realignment (see issue raised in #41). [0942dde]