Skip to content

Releases: brentp/mosdepth

bugfix

29 Sep 21:44
Compare
Choose a tag to compare
  • fix bug with regions and d4 that would cause error even when --d4 was not used.

D4 support!

16 Sep 15:54
Compare
Choose a tag to compare

This release adds support for writing d4 files. See Aaron's poster here

d4 is awesome

d4 is a toolset and format written by Hao Hou from the Quinlan Lab.

mosdepth provides many options while calculating depth because it is slow to re-parse the per-base.bed.gz files. In
many cases, it's faster to re-parse a cram file than to scan large regions from the per-base bed files. In addition, writing per-base.bed.gz has always been a bottleneck in mosdepth even after it was optimized some in last release.

This release has a static d4utils binary for linux below that will allow users to manipulate d4 files.

d4 is much faster to write:

Here are mosdepth run times on a smallish cram test-case:

  • mosdepth without per-base: 5.9s
  • mosdepth with per-base bed.gz: 24.8s
  • mosdepth with per-base d4: 7.7s

Note that using d4 output greatly mitigates the cost of writing the per-base output.
With d4 mosdepth can write per-base output for a 23X CRAM in 2m15s

d4 output is much more useful.

Once the d4 file is created, it is much faster to access. d4 includes command line utilities to view, get stats, and manipulate d4 files. These eventually will replace much of the functionality in mosdepth like quantize, histogram (dist.txt), regions.bed.gz etc since the operations are so fast.

why not bigwig

I made several pull requests to Devon Ryan's excellent BigWig library to improve speed and attempt to reduce memory usage: #41, #42, #43.

I also wrote a bigwig library for nim that uses libBigWig and used that to prototype bigwig output for mosdepth. However, bigwig output dramatically increased the memory usage in mosdepth such that it was not viable.

We will show in the coming manuscript (and see the poster) that d4 is much faster to create and use than bigwig and results in smaller file sizes.

speed and region.dist.txt coverage

02 Mar 20:59
Compare
Choose a tag to compare

0.2.9

  • modifies region.dist.txt to contain the aggregate coverage of each window when -b (integer) is specified
    (otherwise region.dist.txt and global.disk.txt are identical with -b (integer) )
  • improve speed by ~30% when using per-base output with better int2str method (see below fore more details)
Command Mean [s] Min [s] Max [s] Relative
mosdepth_v028 -x $exome 231.300 ± 8.175 222.166 242.883 1.73 ± 0.07
mosdepth_v029 -x $exome 184.653 ± 7.520 176.238 192.636 1.38 ± 0.07
mosdepth_v028 -x -t 4 $exome 170.924 ± 3.811 166.359 175.284 1.28 ± 0.04
mosdepth_v029 -x -t 4 $exome 133.504 ± 3.151 129.220 138.062 1.00

fix indexing

07 Jan 16:53
Compare
Choose a tag to compare

0.2.8

  • fix off-by-one error in CSI index (but not data) of output bed files (#98)

htslib 1.10

06 Jan 17:10
Compare
Choose a tag to compare

this release updates mosdepth to work with htslib 1.10 and the static binary is built with htslib 1.10.
this fixes several bugs opened for mosdepth.

0.2.7

  • small optimizations
  • exit with 1 on bad help #80
  • fix check on remote bam (brentp/hts-nim#48)
  • fix erroneous assert #99
  • update static binary to htslib 1.10 (this fixes other bugs reported and closed in mosdepth)

median and summary file

20 May 15:41
Compare
Choose a tag to compare
  • this release adds a new *.mosdepth.summary.txt output file added by @danielecook. It reports some statistics for each chromosome.
  • it also adds a --median flag to be applied to the regions given in --by. The default is to use mean. This mode is recommended for more stable estimate of depth.
  • fix for #54 for quantize.

To get started, use:

wget https://github.com/brentp/mosdepth/releases/download/v0.2.6/mosdepth && chmod +x ./mosdepth && ./mosdepth -h

That is the (recommended) static binary. To use one that depends on your local htslib (libhts.so), download this binary

static build. pair overlap edge-cases.

07 Mar 17:26
Compare
Choose a tag to compare

0.2.5

  • remove dependency on PCRE (this makes it easier to run on many older systems)
  • don't double count fully overlapping reads (thanks to @jaudoux for the fix in #73)
  • static binary : the binary is completely static but will not allow access over S3/Http

wget https://github.com/brentp/mosdepth/releases/download/v0.2.5/mosdepth && chmod +x ./mosdepth && ./mosdepth -h

should work on all linux 64 bit systems.

fast mode

19 Nov 17:56
Compare
Choose a tag to compare

this release adds a --fast-mode flag that makes mosdepth almost twice as fast. It does not look at mate overlap and it doesn't look at insertion or deletion events in the cigar -- it will still show large deletions with coverage changes and it still skips soft clipped portions of reads. This behavior is likely desirable in many cases and will result in an additional 2X speedup.

0.2.4

  • Add optional --include-flag to allow counting only reads that have some bits in the specified flag set.
    This will only be used rarely--e.g. to count only supplemental reads, use -F 0 --include-flag 2048.
  • Fix case when only a single argument was given to --quantize
  • add --read-groups option to allow specifying that only certain read-groups should be used in the depth calculation. (#60)
  • add --fast-mode that does not look at internal cigar operations like (I)insertions or (D)eletions, but does consider soft and
    hard-clips at the end of the alignment. Also does not correct for mate overlap. This makes mosdepth as much as 2X faster for
    CRAM
    and is likely the desired mode for people using the depth for CNV or general coverage values as drops in coverage
    due to CIGAR operations are often not of interest for coverage-based analyses.

large chroms and region.dist bug.

01 May 19:00
Compare
Choose a tag to compare

0.2.3

  • fix bug in region.dist with chromosomes in bam header, but without any reads. thanks (@vladsaveliev for reporting)
  • support for chromosomes larger than 2^29. (thanks @kaspernie for reporting #41)

dist changes!

19 Mar 18:31
Compare
Choose a tag to compare

This contains a bugfix for a very rare (but major) bug that occurs when successive chromosomes have the same length. The data from the first chrom was not cleared and then polluted the counts for the subsequent chrom. Thanks to Kate B. for reporting and providing a simple test-case.

It also changes the dist output file name(s). Before only a single dist file was created. Now, there will always be a $prefix.global.dist.txt and if --by is specified, there will also be a $prefix.region.dist.txt. Thanks to Alistair W for suggesting.

See below for more details.

0.2.2

  • fix overflow with huge intervals to --by
  • NOTE change to output file name of *.dist.txt. A file named $prefix.mosdepth.global.dist.txt
    will always be created and $prefix.mosdepth.region.dist.txt will be created if --by is specified.
    Previously, there was only a single file named $prefix.mosdepth.dist.txt which no longer exists.
    This allows users to, for example, use --by to see coverage of gene regions for WGS, and to see the
    global WGS coverage and the coverage in their genes of interest.
  • fix bug that would manifest with consecutive chromosomes of the same length. chromosomes other than
    the first of a given length would have incorrect values.