Skip to content

Commit

Permalink
Release miniprot-0.7 (r207)
Browse files Browse the repository at this point in the history
  • Loading branch information
lh3 committed Dec 26, 2022
1 parent 3f96c26 commit 2e57d1a
Show file tree
Hide file tree
Showing 3 changed files with 44 additions and 4 deletions.
27 changes: 27 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,30 @@
Release 0.7-r207 (25 December 2022)
-----------------------------------

Notable changes:

* Improvement: replaced open syncmers with modimers. This simplifies the code
and slightly reduces the memory.

* Improvement: fine tune parameters for higher sensitivity at a minor cost of
junction accuracy: a) only index ORFs >= 30bp; b) reduced max k-mer
occurrences from 50k to 20k; c) sample k-mers at a rate of 50%; d) reduced
min number of k-mers from 5 to 3; e) add a bonus chaining score for anchors
on the same reference block.

* Improvement: adjust the max k-mer occurrence dynamically per protein.

* Improvement: implemented 2-level chaining like minimap2 and minigraph. This
reduces chaining time.

* Bugfix: fixed a rare off-by-1 memory violation

* Bugfix: fixed a memory leak

(0.7: 25 December 2022, r207)



Release 0.6-r185 (12 December 2022)
-----------------------------------

Expand Down
19 changes: 16 additions & 3 deletions miniprot.1
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.TH miniprot 1 "12 December 2022" "miniprot-0.6 (r185)" "Bioinformatics tools"
.TH miniprot 1 "25 December 2022" "miniprot-0.7 (r207)" "Bioinformatics tools"
.SH NAME
.PP
miniprot - protein-to-genome alignment with splicing and frameshifts
Expand Down Expand Up @@ -39,8 +39,13 @@ Miniprot aligns protein sequences to a genome allowing potential frameshifts and
.BI -k \ INT
K-mer size for genome-wide indexing [6]
.TP
.BI -s \ INT
Syncmer submer size [4]. In average, miniprot selects a k-mer every 2*(k-s)+1 residues.
.BI -M \ INT
Sample k-mers at a rate
.RI 1/2** INT
[1]. Increasing this option reduces peak memory but decreases sensitivity.
.TP
.BI -L \ INT
Minimum ORF length to index [30]
.TP
.BI -b \ INT
Number of bits per bin [8]. Miniprot splits the genome into non-overlapping bins of 2^8 bp in size.
Expand Down Expand Up @@ -140,6 +145,9 @@ Change the ID field in GFF3 to
.RI QueryName CHAR HitIndex
[]. If not specified, the default ID looks like `MP000012'.
.TP
.B --gtf
Output in the GTF format
.TP
.B -u
Print unmapped query proteins
.TP
Expand All @@ -149,6 +157,11 @@ Output up to
.BR -N }
alignments per query [1000].
.TP
.BI --outs \ FLOAT
Output an alignment only if its score is at least
.IR FLOAT *bestScore,
where bestScore is the best alignment score of the protein [0.99]
.TP
.BI -K \ NUM
Query batch size [2M]
.SH OUTPUT FORMAT
Expand Down
2 changes: 1 addition & 1 deletion miniprot.h
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

#include <stdint.h>

#define MP_VERSION "0.6-r204-dirty"
#define MP_VERSION "0.7-r207"

#define MP_F_NO_SPLICE 0x1
#define MP_F_NO_ALIGN 0x2
Expand Down

0 comments on commit 2e57d1a

Please sign in to comment.