diff --git a/NEWS.md b/NEWS.md index b229200..9fc4aca 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,3 +1,30 @@ +Release 0.7-r207 (25 December 2022) +----------------------------------- + +Notable changes: + + * Improvement: replaced open syncmers with modimers. This simplifies the code + and slightly reduces the memory. + + * Improvement: fine tune parameters for higher sensitivity at a minor cost of + junction accuracy: a) only index ORFs >= 30bp; b) reduced max k-mer + occurrences from 50k to 20k; c) sample k-mers at a rate of 50%; d) reduced + min number of k-mers from 5 to 3; e) add a bonus chaining score for anchors + on the same reference block. + + * Improvement: adjust the max k-mer occurrence dynamically per protein. + + * Improvement: implemented 2-level chaining like minimap2 and minigraph. This + reduces chaining time. + + * Bugfix: fixed a rare off-by-1 memory violation + + * Bugfix: fixed a memory leak + +(0.7: 25 December 2022, r207) + + + Release 0.6-r185 (12 December 2022) ----------------------------------- diff --git a/miniprot.1 b/miniprot.1 index 27aede2..5d24100 100644 --- a/miniprot.1 +++ b/miniprot.1 @@ -1,4 +1,4 @@ -.TH miniprot 1 "12 December 2022" "miniprot-0.6 (r185)" "Bioinformatics tools" +.TH miniprot 1 "25 December 2022" "miniprot-0.7 (r207)" "Bioinformatics tools" .SH NAME .PP miniprot - protein-to-genome alignment with splicing and frameshifts @@ -39,8 +39,13 @@ Miniprot aligns protein sequences to a genome allowing potential frameshifts and .BI -k \ INT K-mer size for genome-wide indexing [6] .TP -.BI -s \ INT -Syncmer submer size [4]. In average, miniprot selects a k-mer every 2*(k-s)+1 residues. +.BI -M \ INT +Sample k-mers at a rate +.RI 1/2** INT +[1]. Increasing this option reduces peak memory but decreases sensitivity. +.TP +.BI -L \ INT +Minimum ORF length to index [30] .TP .BI -b \ INT Number of bits per bin [8]. Miniprot splits the genome into non-overlapping bins of 2^8 bp in size. @@ -140,6 +145,9 @@ Change the ID field in GFF3 to .RI QueryName CHAR HitIndex []. If not specified, the default ID looks like `MP000012'. .TP +.B --gtf +Output in the GTF format +.TP .B -u Print unmapped query proteins .TP @@ -149,6 +157,11 @@ Output up to .BR -N } alignments per query [1000]. .TP +.BI --outs \ FLOAT +Output an alignment only if its score is at least +.IR FLOAT *bestScore, +where bestScore is the best alignment score of the protein [0.99] +.TP .BI -K \ NUM Query batch size [2M] .SH OUTPUT FORMAT diff --git a/miniprot.h b/miniprot.h index 05e07f2..faae7a2 100644 --- a/miniprot.h +++ b/miniprot.h @@ -3,7 +3,7 @@ #include -#define MP_VERSION "0.6-r204-dirty" +#define MP_VERSION "0.7-r207" #define MP_F_NO_SPLICE 0x1 #define MP_F_NO_ALIGN 0x2