Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finished without creating output_bins #183

Open
Sung-hub opened this issue Jan 16, 2025 · 1 comment
Open

Finished without creating output_bins #183

Sung-hub opened this issue Jan 16, 2025 · 1 comment

Comments

@Sung-hub
Copy link

I am running SemiBin2 single_easy_bin and experiencing premature finishing without output_bins. There was no error message. I used large size metagenome assemblies, and the assemblies were made from pacbio hifi long-read sequences. Some of runs successfully finished, but most of them could not create output_bins. Even though I put -t 30, it generally run on single thread. I could not see any memory issue on our server.

This command line I put in:
$ nohup singularity exec https://depot.galaxyproject.org/singularity/semibin:2.1.0--pyhdfd78af_0 SemiBin2 single_easy_bin -i /home/sung.shin/pool_2_metaMDBG/contigs.fasta -b pool2metaMDBG_aln.sort.bam -o pool2_metaMDBG_sb2_binning/ --self-supervised --sequencing-type=long_reads -t 30 --random-seed 123 2>semibin.log &

log messages I got:
$ cat semibin.log
nohup: ignoring input and appending output to 'nohup.out'
2025-01-03 11:22:08 arsnecla0ap2.marc.usda.gov SemiBin[1485536] INFO Binning for long_read
2025-01-03 11:22:12 arsnecla0ap2.marc.usda.gov SemiBin[1485536] INFO Did not detect GPU, using CPU.
2025-01-03 11:22:38 arsnecla0ap2.marc.usda.gov SemiBin[1485536] INFO Generating training data...
2025-01-03 15:17:52 arsnecla0ap2.marc.usda.gov SemiBin[1485536] INFO Calculating coverage for every sample.
2025-01-03 16:44:37 arsnecla0ap2.marc.usda.gov SemiBin[1485536] INFO Processed: pool2metaMDBG_aln.sort.bam
2025-01-03 16:52:53 arsnecla0ap2.marc.usda.gov SemiBin[1485536] INFO Start training from a single sample.
2025-01-03 16:53:42 arsnecla0ap2.marc.usda.gov SemiBin[1485536] INFO Training model...
100%|██████████| 15/15 [53:05<00:00, 212.34s/it]
2025-01-03 17:46:48 arsnecla0ap2.marc.usda.gov SemiBin[1485536] INFO Training finished.
2025-01-03 17:46:49 arsnecla0ap2.marc.usda.gov SemiBin[1485536] INFO Start binning.
2025-01-03 17:47:59 arsnecla0ap2.marc.usda.gov SemiBin[1485536] INFO Running naive ORF finder
[sung.shin@arsnecla0ap2 pool2_dastool]$ cd pool2_metaMDBG_sb2_binning/

$ cat SemiBinRun.log
[2025-01-03 11:22:08,400] INFO: Binning for long_read
[2025-01-03 11:22:12,494] INFO: Did not detect GPU, using CPU.
[2025-01-03 11:22:38,866] INFO: Generating training data...
[2025-01-03 15:17:52,924] INFO: Calculating coverage for every sample.
[2025-01-03 16:44:37,329] INFO: Processed: pool2metaMDBG_aln.sort.bam
[2025-01-03 16:52:53,075] INFO: Start training from a single sample.
[2025-01-03 16:53:42,830] INFO: Training model...
[2025-01-03 17:46:48,649] INFO: Training finished.
[2025-01-03 17:46:49,343] INFO: Start binning.
[2025-01-03 17:47:59,090] INFO: Running naive ORF finder

resulting files:
$ ls
data.csv data_split.csv markers.hmmout model.h5 pool2metaMDBG_aln.sort.bam_0_data_cov.csv SemiBinRun.log

Could you give me some advice?

@Sung-hub
Copy link
Author

Is this issue can be possibly fixed by running binning step separately, when run was finished without output_bins?
I found "Advanced single-sample binning workflows" on https://semibin.readthedocs.io/en/latest/usage/.
What is difference between bin_short and bin_long?
SemiBin2 bin_short
-i S1.fa
--environment human_gut
--data S1_output/data.csv
-o S1_output
SemiBin2 bin_long
-i S1.fa
--environment human_gut
--data S1_output/data.csv
-o S1_output

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant