Skip to content

Commit

Permalink
Slightly speed up various cram decoding functions (samtools#1580)
Browse files Browse the repository at this point in the history
None of this is huge, but it all adds up.

- bam_set1 has been refactored so -O3 is more likely to do unrolling
  and vectorisation.

    // Old          time   inst        cyc
    // gcc -O2      12.36  78936832183 36853852204
    // gcc -O3      12.37  78713347525 36867027825
    // clang13 -O2  12.43  77451926728 37012866717
    // clang13 -O3  12.32  77627221907 36691623424
    // gcc12 -O2    12.43  78895089091 37081260172
    // gcc12 -O3    12.36  78505904437 36829216967

    // New
    // gcc -O2      12.47  78832021505 37200597109 +
    // gcc -O3      12.14  76499369401 36390334338 --
    // clang13 -O2  12.38  76678460761 36920111561 ~
    // clang13 -O3  12.26  76678023071 36548488492 ~
    // gcc12 -O2    12.38  78581694397 36880034181 -
    // gcc12 -O3    12.15  76356625541 36293921439 --

- Improve the MD/NM generation in CRAM decoding.
  With decode_md=1 (default) by decode changed from 12.91s to 12.57s
  With decode_md=0 it's 11.92, so that's 1/3rd of the overhead
  removed.

- Changed the block_resize to resize in slightly smaller chunks and to
  use integer maths.

- Reduce excessive pointer redirection in cram_decode_seq.

  Unsure if this speeds things up much (sometimes it seems to), but it
  provides tidier code too.

Comparisons with Dev(/D) and this commit (/4) on Revio (re/) and
NovaSeq (nv/) with a variety of compilers and optimisations.  Figures
are cycle counts from perf stat

                     Xeon E5-2660         Xeon Gold 6142
re/D gcc12-O2        85699982958          74752510144
re/4 gcc12-O2        82265084038          71947558666 -3.7/3.7

re/D gcc12-O3        85837077212          74392223354
re/4 gcc12-O3        82024293685          71861154116 -4.4/3.4

re/D clang12-3       85608876213          73934329619
re/4 clang12-3       84390364926          73961392095 -1.4/0

re/D clang12-2       86861787827          74255338533
re/4 clang12-2       83186843797          72421845542 -4.2/2.5; better than O3

nv/D gcc12-O2        36694089398          31444641828
nv/4 gcc12-O2        34949122875          30061074125 -4.8/-4.4

nv/D gcc12-O3        36528573980          30792932748
nv/4 gcc12-O3        35069572111          30066058127 -4.0/2.4

nv/D clang12-3       37906764004          32459168883
nv/4 clang12-3       36344679534          30786987972 -4.1/-5.2

nv/D clang12-2       38443827308          32304948037
nv/4 clang12-2       36361384580          31022553379 -5.4/-4.0

Benchmarks on 10 million NovaSeq records, showing billions
of cycles as more robust than CPU time.

    EPYC 7543
                       before   after
        gcc(7)  -O2    28.6     28.3    -1.0
        gcc12   -O2    28.2     28.3    +0.4
        clang7  -O2    30.2     28.2    -6.6
        clang13 -O2    29.9     28.2    -5.7

        gcc(7)  -O3    28.7     28.2    -1.7
        gcc12   -O3    28.0     27.2    -2.9
        clang7  -O3    30.1     28.3    -6.0
        clang13 -O3    29.7     28.3    -4.7

    Xeon Gold 6142
                       before   after
        gcc(7)  -O2    32.8     30.5    -7.0
        gcc12   -O2    31.8     30.1    -5.3
        clang7  -O2    33.1     29.9    -9.7
        clang13 -O2    34.1     30.8    -9.7

        gcc(7)  -O3    32.7     30.2    -7.6
        gcc12   -O3    31.6     29.1    -7.9
        clang7  -O3    34.3     30.0    -12.5
        clang13 -O3    33.3     30.9    -7.2
  • Loading branch information
jkbonfield authored Mar 15, 2023
1 parent 19cd41c commit 46bcc36
Show file tree
Hide file tree
Showing 3 changed files with 134 additions and 127 deletions.
Loading

0 comments on commit 46bcc36

Please sign in to comment.