Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New release plan for Sunbeam 3.0 #307

Closed
kylebittinger opened this issue Feb 15, 2022 · 9 comments
Closed

New release plan for Sunbeam 3.0 #307

kylebittinger opened this issue Feb 15, 2022 · 9 comments
Assignees

Comments

@kylebittinger
Copy link
Member

We have a new software developer for the PennCHOP Microbiome Program, Charlie Bushman (@Ulthran), which presents a new opportunity for us to devote some serious attention to Sunbeam. I'd like to renew our push for a release of Sunbeam 3.0.

Current/past lead developers @louiejtaylor @eclarke @ressy @zhaoc1 -- if you have thoughts on features, fixes, and changes that should be included in this release, please let us know so we can have this stuff on our radar. If you have no opinion, that's cool too.

I'd also like to flag members of the CHOP Microbiome Center, @ctanes @scottdaniel @vitu1 @WeimingWHu so they can contribute their thoughts.

Hope all of you are doing well. Feel free to reach out by email if needed.

@zhaoc1
Copy link
Member

zhaoc1 commented Feb 15, 2022

That's good news! Good luck with Sunbeam 3.0!

@ressy
Copy link
Member

ressy commented Feb 17, 2022

Glad to hear it! No particular opinions from me-- just here reiterating Chunyu's message. It'll be good for the package to get some TLC. I keep hoping I could hop in and help clear out some issues but just never get the chance.

@zhaoc1
Copy link
Member

zhaoc1 commented Feb 17, 2022

I recommend one of the recent ultra-fast meta-genotyping tool published in Nature Biotechnology (github repo) - GT-Pro.

I think it serves the goal of Sunbeam as a metagenomic sequencing pipeline very well, and expand the capacity of Sunbeam to strain-level analysis.

@louiejtaylor
Copy link
Member

Thanks for the ping and sorry for the slow response! I remember back before I left we had a list of things we wanted to do to wrap up version 3.0. Fortunately, we decided to make (or base these on) issues in the repo, so hopefully they should all be clear about what needs to be done. These may be of debatable importance half a year later, but here are the ones that are still outstanding:

High-priority (feature-related) issues:

A decent summary of the already-completed differences between 2.1.0 (stable) and 3.0 (dev) can be found in the changelog. There's a ton of good stuff already done, like automated extension installing and config updating, new tool versions, and more flexibility for the user in configuring the pipeline. For previous releases, we also made sure the automated tests passed and ran through the other outstanding issues as well. I echo Jesse's sentiment--I'd love to jump in and help but don't have the bandwidth.

There were a few other potential improvements we were thinking about that might be nice for future versions but aren't necessarily required for 3.0:

  • Updating to Snakemake 5.8 #263: this would be nice to take advantages of improvements in snakemake, but snakemake updating to let you pass an arbitrary number of config files instead of just one wreaked some havoc with our argument parsing, if I remember correctly
  • Making sunbeam install-able via conda would be really nice, but doesn't seem trivial!

Hope this is helpful--best of luck!

@ressy
Copy link
Member

ressy commented Feb 23, 2022

Thanks a lot for pulling together that summary, Louis.

I came across this paper today, "Sustained software development, not number of citations or journal choice, is indicative of accurate bioinformatic software."

In addition we suggest that further efforts be made to encourage continual updates to software tools. To paraphrase some of the suggestions of Siepel (2019), these efforts may include more secure positions for developers, institutional promotion criteria include software maintenance, lower publication barriers for significant software updates, encourage further funding for software maintenance and improvement—not just new tools [55]. If these issues were recognised by research managers, funders and reviewers, then perhaps the future bioinformatic software tool landscape will be much improved.

I'll second that!

@levlitichev
Copy link

Exciting to hear that a new release of Sunbeam is planned! I've been working a lot with Sunbeam lately (thank you!) and have a few suggestions:

  1. I find that there is a lot of overhead (e.g. searching for extensions, checking file paths, etc.) that makes Sunbeam pretty slow. For me, Sunbeam takes ~30 seconds to get to say "Building DAG of jobs..." , which makes quick troubleshooting difficult. It would be nice to speed this up, but I unfortunately don't have any concrete suggestions because I don't know which steps are slow.
  2. I think cutadapt shouldn't by default throw out reads that have adapters removed (see issue Why remove trimmed reads? #288).
  3. I use Sunbeam in conjunction with an LSF Snakemake profile on HPC. I had to modify the rule for Kraken in order to give that one job a lot more memory than the other jobs. It could be a nice parameter to add to the config file.
  4. Regarding Any way to remove temporary files for all_decontam? #275 above, I actually WANT to keep the host reads. I am mapping host reads to genotyped mice in order to ensure there were no sample mix-ups. It's not ready for primetime, but I've made this functionality into a Sunbeam extension. In brief, it would be nice to have an option to keep the host read bam file.
  5. I don't use the assembly, annotation, or mapping modules, so it would be nice not having those as core parts of Sunbeam. I saw that there are a few issues already about separating out other components of Sunbeam, like taxonomic classification. I think modularization is generally the right idea.
  6. The extension I use all the time is sbx_gene_clusters for functional classification of my taxonomic reads directly. I just made a PR with some small suggestions, but I think building out this extension and making it more prominent throughout the documentation could be worthwhile.

Happy to chat more about this, and apologies if I'm jumping into this conversation without the right context. For points 2, 3, and 4 above, it'd be easy to add my local changes as a PR.

Thanks again for a nice tool, and I'm excited to hear that it will be getting some TLC!

@Ulthran
Copy link
Contributor

Ulthran commented May 3, 2022

Hi @levlitichev, thanks for the feedback! User input on what to fix and what to add is super useful. At the moment I'm working mostly on upgrading dependencies and separating each functional unit (eventually to get to the point where 5. is possible) and a few features that have been asked for a lot. I'd love to talk more with you about your suggestions in the near future though.

Thanks,
Charlie

@Ulthran Ulthran self-assigned this Dec 13, 2022
@Ulthran
Copy link
Contributor

Ulthran commented Aug 16, 2023

Hi again @levlitichev, I think a lot of what you mentioned has now been integrated into sunbeam as of the latest v4.0.0 release. I'm going to close this issue but if there are any parts of it that you want to open again or new suggestions please open a new issue(s). Would love to hear your thoughts on where sunbeam is at now and where it should go.

Thanks,
Charlie

@Ulthran Ulthran closed this as completed Aug 16, 2023
@levlitichev
Copy link

Awesome, thanks for the update! I'll update to the current release when I next have to use Sunbeam, and I'll let you know how it goes. Thanks again for your efforts!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants