Incorrect number of reads generated #13

elgartmi · 2016-07-18T15:45:20Z

I am simulating reads from non-complete bacterial genomes. They tend to have a lot of short contigs.
For example see : Lactobacillus malefermentans KCTC 3548.

So each time the program tries to get a read from such contig it correclty outputs :
[wgsim_core] skip sequence 'gi|338736693|dbj|BACN01000170.1|' as it is shorter than 500!

However, each time it outputs this, a read that should have gotten into output file is skipped.
So in a file with many such short contigs, the resulting file has much fewer reads than specified via -N X.

As a workaround I as it to generate more reads and then keep the top X with "head -n X*4".
However, its a bug I believe :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect number of reads generated #13

Incorrect number of reads generated #13

elgartmi commented Jul 18, 2016

Incorrect number of reads generated #13

Incorrect number of reads generated #13

Comments

elgartmi commented Jul 18, 2016