Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preseq failing most of the time #161

Closed
ewels opened this issue May 29, 2020 · 13 comments · Fixed by #470
Closed

Preseq failing most of the time #161

ewels opened this issue May 29, 2020 · 13 comments · Fixed by #470
Labels
help wanted Extra attention is needed
Milestone

Comments

@ewels
Copy link
Member

ewels commented May 29, 2020

Anyone running the pipeline will be familiar with this log message:

terminated with an error status (1) -- Error is ignored.

Preseq has a history of failing a lot, especially for small or low complexity files. But it seems to be failing a lot now, maybe all of the time. This needs investigating.

Phil

@ewels ewels added the bug Something isn't working label May 29, 2020
@ewels ewels mentioned this issue May 29, 2020
@bsiranosian
Copy link

At the very least, adding an ignore errorStrategy to this process will help your whole run not get killed due to a preseq failure.

/*
 * STEP 9 - preseq
 */
process preseq {
    errorStrategy 'ignore'

@ewels
Copy link
Member Author

ewels commented Sep 4, 2021

Yup! The pipeline already has that set as default so you shouldn't need to set that in any additional configs:

withName:preseq {
errorStrategy = 'ignore'
}

It would be nice to try to get it to fail a little less though 😅

@bsiranosian
Copy link

Oh good, I didn't notice that. I'm not sure why my whole run failed then.

@ewels ewels added help wanted Extra attention is needed and removed bug Something isn't working labels Nov 3, 2022
@apeltzer
Copy link
Member

apeltzer commented Nov 3, 2022

Later preseq versions received some updates to fail more gracefully, so if you upgrade the preseq version a bit, you should be fine 👍🏻

@ewels
Copy link
Member Author

ewels commented Nov 3, 2022

I think we're already on 3.1.2 which is quite recent. Do you know when those versions went out? I still see the same failures on every test run.

@ewels
Copy link
Member Author

ewels commented Nov 3, 2022

It's tempting to update the config to allow the error exit code, so that we don't always get the pipeline report saying that the pipeline completed with errors (which always worries me / others).

@ewels
Copy link
Member Author

ewels commented Nov 3, 2022

Using BED files instead of BAM, as suggested in #96 (comment) could also potentially help..

@Rohit-Satyam
Copy link

Rohit-Satyam commented Jan 10, 2023

I tried the BED file as input but it still fails

gatk MarkDuplicatesSpark -I ${bam} -O ${sid}.dedup.bam -M ${sid}_markdup_metrics.txt --tmp-dir . -OBI
        gatk EstimateLibraryComplexity -I ${bam} -O ${sid}_est_lib_complex_metrics.txt
        # convert to BED file with paired-ends (BEDPE format)
        bamToBed -i ${sid}.dedup.bam -bedpe >  ${sid}.sorted.bed
        preseq lc_extrap -v -P ${sid}.sorted.bed -o ${sid}.lc.preseq.txt
        preseq c_curve  -v -P ${sid}.sorted.bed -o ${sid}.c.preseq.txt

PAIRED_END_BED_INPUT
  TOTAL READS     = 14155
  DISTINCT READS  = 14006
  DISTINCT COUNTS = 5
  MAX COUNT       = 94
  COUNTS OF 1     = 13956
  MAX TERMS       = 2
  OBSERVED COUNTS (95)
  1	13956
  2	46
  3	1
  5	2
  94	1
  
  ERROR:	max count before zero is less than min required count (4) duplicates removed

@bounlu
Copy link
Contributor

bounlu commented Oct 4, 2023

May I suggest to implement to run in defect mode as suggested by Preseq developer when the number of reads is >50M?

ERROR: too many defects in the approximation, consider running in defect mode

smithlabcode/preseq#29

@sateeshperi
Copy link
Contributor

@bounlu does the defect run mode fix this issue ?

@bounlu
Copy link
Contributor

bounlu commented Sep 22, 2024

Sometimes yes, but not always. It may still fail in defect mode.

@sateeshperi
Copy link
Contributor

i see, any recommendations on what can be done or should we mark this as expected and close the issue ?

@mahesh-panchal
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
7 participants