You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I found that this version the build command is very slow, a 5G fastq reads take more then 2 hours and more while the ropebwt2 is very fast, several minutes. I am wondering whether this could be parallelized or something because this is not practical for real word sequence files, which are always more than 20G.
Thanks,
Jianshu
The text was updated successfully, but these errors were encountered:
The current, built-in version does tend to be slower than the ropebwt2 approach for constructing the BWT. This is most noticeable for short-read sequencing data, where ropebwt2 can take advantage of parallelization during some of the inserts. The downside of this approach is that the commands to construct it are a bit more complicated and may require excess memory/disk space to do the sorting.
The msbwt2 construction is primarily meant to make the process (e.g. the commands) a bit easier for end users who may not be as CLI familiar. We would like to make this method faster (perhaps via parallelization and/or a better algorithm), but we haven't had time to work on this. It also may not scale as well, we haven't really tested a very large dataset yet. Happy to review PRs from anyone wishing to do R&D on improving the method.
So in short, if you want speed, we recommend using the ropebwt2 approach. If you want ease of use, we recommend the msbwt2 approach.
Hello rust-msbwt team,
I found that this version the build command is very slow, a 5G fastq reads take more then 2 hours and more while the ropebwt2 is very fast, several minutes. I am wondering whether this could be parallelized or something because this is not practical for real word sequence files, which are always more than 20G.
Thanks,
Jianshu
The text was updated successfully, but these errors were encountered: