Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

very slow #10

Open
jianshu93 opened this issue Jul 25, 2022 · 1 comment
Open

very slow #10

jianshu93 opened this issue Jul 25, 2022 · 1 comment
Labels
enhancement New feature or request

Comments

@jianshu93
Copy link

Hello rust-msbwt team,

I found that this version the build command is very slow, a 5G fastq reads take more then 2 hours and more while the ropebwt2 is very fast, several minutes. I am wondering whether this could be parallelized or something because this is not practical for real word sequence files, which are always more than 20G.

Thanks,

Jianshu

@holtjma
Copy link
Member

holtjma commented Jul 26, 2022

The current, built-in version does tend to be slower than the ropebwt2 approach for constructing the BWT. This is most noticeable for short-read sequencing data, where ropebwt2 can take advantage of parallelization during some of the inserts. The downside of this approach is that the commands to construct it are a bit more complicated and may require excess memory/disk space to do the sorting.

The msbwt2 construction is primarily meant to make the process (e.g. the commands) a bit easier for end users who may not be as CLI familiar. We would like to make this method faster (perhaps via parallelization and/or a better algorithm), but we haven't had time to work on this. It also may not scale as well, we haven't really tested a very large dataset yet. Happy to review PRs from anyone wishing to do R&D on improving the method.

So in short, if you want speed, we recommend using the ropebwt2 approach. If you want ease of use, we recommend the msbwt2 approach.

@holtjma holtjma added the enhancement New feature or request label Jul 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants