Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: signalp_v4 and tmhmm segfaulting #100

Open
markusHaferkamp opened this issue Jan 7, 2025 · 0 comments
Open

BUG: signalp_v4 and tmhmm segfaulting #100

markusHaferkamp opened this issue Jan 7, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@markusHaferkamp
Copy link

markusHaferkamp commented Jan 7, 2025

Hi again!

We moved our HPC to new hardware in November/December, and the dust is slowly settling. Weirdly, my modified conda version of 1.2.7 wouldn't work anymore - signalp_v4 kept segfaulting.

So I gave your 1.2.8-alpha branch a try and I'm happy to report it resolved on the first try, without the need for a modified environment.yml. But..

Bug:
Running a full test with nextflow run -profile test -with-conda "$USW/miniconda3/envs/predector" -resume -r 1.2.8-alpha ccdmb/predector is reporting the same signalp_v4 segfaults I experienced on 1.2.7. Additionally, tmhmm is now also segfaulting.

The error replicates on both regular compute nodes and in userspace on front-end nodes. There was no difference between having the working directory on BeeGFS spinning disks or NFS SSDs.

Expected result:
A successful test run

OS:

  • Debian GNU / Linux 12 (bookworm)
  • conda
  • Nextflow 24.10.3
  • No WSL, no macOS
  • File systems: BeeGFS on $WORK, NFS on $USW, $HOME and $SSD
  • $USW and $HOME are read-only on regular compute nodes

Logs:

nextflow.log reporting on tmhmm failing
out.txt is in fact an empty file, in.fasta looks healthy

Jan-07 12:04:10.091 [TaskFinalizer-1] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for
  task: name=tmhmm (2); work-dir=/$WORK/temp/predector/work/3a/31b0f9b87da082d33c8e812808ca39
  error [nextflow.exception.ProcessFailedException]: Process `tmhmm (2)` terminated with an error exit status (65)
Jan-07 12:04:10.115 [TaskFinalizer-1] ERROR nextflow.processor.TaskProcessor - Error executing process > 'tmhmm (2)'

Caused by:
  Process `tmhmm (2)` terminated with an error exit status (65)


Command executed:

  CHUNKSIZE="$(decide_task_chunksize.sh in.fasta "4" 100)"
  
  # tail -n+2 is to remove header
  parallel         --halt now,fail=1         --joblog log.txt         -j "4"         -N "${CHUNKSIZE}"         --line-buffer          --recstart '>'         --pipe         'tmhmm -short -d'     < in.fasta     | cat > out.txt
  
  predutils r2js         --pipeline-version "1.2.8-alpha"         --software-version "2.0c"         -o out.ldjson         tmhmm out.txt in.fasta
  
  rm -rf -- TMHMM_*

Command exit status:
  65

Command output:
  (empty)

Command error:
  decodeanhmm 1.1g
  Copyright (C) 1998 by Anders Krogh
  decodeanhmm 1.1g
  Copyright (C) 1998 by Anders Krogh
  decodeanhmm 1.1g
  Copyright (C) 1998 by Anders Krogh
  Segmentation fault
  decodeanhmm 1.1g
  Copyright (C) 1998 by Anders Krogh
  Segmentation fault
  Name "main::lab" used only once: possible typo at /$USW/miniconda3/envs/predector/share/tmhmm-2.0c-3/bin/tmhmmformat.pl line 130.
  Name "main::score" used only once: possible typo at /$USW/miniconda3/envs/predector/share/tmhmm-2.0c-3/bin/tmhmmformat.pl line 114.
  Name "main::normscore" used only once: possible typo at /$USW/miniconda3/envs/predector/share/tmhmm-2.0c-3/bin/tmhmmformat.pl line 115.
  Segmentation fault
  Name "main::score" used only once: possible typo at /$USW/miniconda3/envs/predector/share/tmhmm-2.0c-3/bin/tmhmmformat.pl line 114.
  Name "main::normscore" used only once: possible typo at /$USW/miniconda3/envs/predector/share/tmhmm-2.0c-3/bin/tmhmmformat.pl line 115.
  Name "main::lab" used only once: possible typo at /$USW/miniconda3/envs/predector/share/tmhmm-2.0c-3/bin/tmhmmformat.pl line 130.
  Name "main::score" used only once: possible typo at /$USW/miniconda3/envs/predector/share/tmhmm-2.0c-3/bin/tmhmmformat.pl line 114.
  Name "main::normscore" used only once: possible typo at /$USW/miniconda3/envs/predector/share/tmhmm-2.0c-3/bin/tmhmmformat.pl line 115.
  Name "main::lab" used only once: possible typo at /$USW/miniconda3/envs/predector/share/tmhmm-2.0c-3/bin/tmhmmformat.pl line 130.
  Name "main::lab" used only once: possible typo at /$USW/miniconda3/envs/predector/share/tmhmm-2.0c-3/bin/tmhmmformat.pl line 130.
  Name "main::normscore" used only once: possible typo at /$USW/miniconda3/envs/predector/share/tmhmm-2.0c-3/bin/tmhmmformat.pl line 115.
  Name "main::score" used only once: possible typo at /$USW/miniconda3/envs/predector/share/tmhmm-2.0c-3/bin/tmhmmformat.pl line 114.

  Failed to parse file <out.txt>.
  
  We could not parse any records from the input file.
  It's possible that the input is empty, or that it is in the wrong format.
  This can happen if an analysis fails but doesn't tell us that it failed.
  Please check the input file indicated above and contact us for help if you need it.`

nextflow.log on signalp_v4 and tmhmm ultimatively failing

Jan-07 12:04:10.190 [TaskFinalizer-2] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for
  task: name=signalp_v4 (2); work-dir=/$WORK/temp/predector/work/3a/0423ae5e1e929bd321cbbd59fe6424
  error [nextflow.exception.ProcessFailedException]: Process `signalp_v4 (2)` terminated with an error exit status (65)
Jan-07 12:04:10.191 [TaskFinalizer-3] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for
  task: name=tmhmm (1); work-dir=/$WORK/temp/predector/work/f9/0635e5eedd5eb9509d3800d10b43c1
  error [nextflow.exception.ProcessFailedException]: Process `tmhmm (1)` terminated with an error exit status (65)
Jan-07 12:04:10.216 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 42; name: signalp_v4 (1); status: COMPLETED; exit: 65; error: -; workDir: /$WORK/temp/predector/work/48/a0bf2ceb1145f89b91a9bf6d91b30d]
Jan-07 12:04:10.218 [TaskFinalizer-4] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for
  task: name=signalp_v4 (1); work-dir=/$WORK/temp/predector/work/48/a0bf2ceb1145f89b91a9bf6d91b30d
  error [nextflow.exception.ProcessFailedException]: Process `signalp_v4 (1)` terminated with an error exit status (65)

signalp_v4 (2) logs
From $WORK/temp/predector/work/3a/0423ae5e1e929bd321cbbd59fe6424
Logs for signalp_v4 (1) look very similar

command.log

Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Failed to parse file </$WORK/temp/predector/work/3a/0423ae5e1e929bd321cbbd59fe6424/out.txt> at line 8.

In field 'name': Could not parse value '' as a non-empty string.
out.txt

# SignalP-4.1g euk predictions
# name                     Cmax  pos  Ymax  pos  Smax  pos  Smean   D     ?  Dmaxcut    Networks-used
# SignalP-4.1g euk predictions
# name                     Cmax  pos  Ymax  pos  Smax  pos  Smean   D     ?  Dmaxcut    Networks-used
# SignalP-4.1g euk predictions
# name                     Cmax  pos  Ymax  pos  Smax  pos  Smean   D     ?  Dmaxcut    Networks-used
# SignalP-4.1g euk predictions
# name                     Cmax  pos  Ymax  pos  Smax  pos  Smean   D     ?  Dmaxcut    Networks-used
                           0.000   1  0.000   1  0.000   1  0.000   0.000 N  0.450      SignalP-noTM
                           0.000   1  0.000   1  0.000   1  0.000   0.000 N  0.450      SignalP-noTM
                           0.000   1  0.000   1  0.000   1  0.000   0.000 N  0.450      SignalP-noTM
                           0.000   1  0.000   1  0.000   1  0.000   0.000 N  0.450      SignalP-noTM

Comments:
Interestingly enough, the registration processes for both signalp_v4 and tmhmm2 also segfault:

Registering source file /$USW/predector/dependencies/signalp-4.1g.Linux.tar.gz for signalp4 into conda environment at:
/$USW/miniconda3/envs/predector/share/signalp4-4.1g-3

Unregistering old source files if they exist.

patching file signalp
Finished registering signalp4.
Testing installation...
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Test succeeded.
signalp4 is now fully installed!!
Registering source file /$USW/predector/dependencies/tmhmm-2.0c.Linux.tar.gz for tmhmm into conda environment at:
/$USW/miniconda3/envs/predector/share/tmhmm-2.0c-3

Unregistering old source files if they exist.

patching file bin/tmhmm
Finished registering tmhmm.
Testing installation...
Segmentation fault
Test succeeded.
tmhmm is now full installed!

edit (Jan 08)

I tried my luck with Apptainer (ex-Singularity) today. The /dev/ branch 1.2.8-alpha won't create containers (it does not find the register scripts for proprietary software in post), but the /master/ 1.2.7 branch worked nicely. More than stoked to find the local Apptainer version passes all tests.

Copying the exact same environment to the HPC however? Not so great. I'd encountered the same error runing the conda version of 1.2.7 on the new HPC before.

nextflow.log

ERROR ~ Error executing process > 'signalp_v4 (1)'

Caused by:
  Process `signalp_v4 (1)` terminated with an error exit status (65)


Command executed:

  CHUNKSIZE="$(decide_task_chunksize.sh in.fasta "4" 100)"
  
  parallel         --halt now,fail=1         --joblog log.txt         -j "4"         -N "${CHUNKSIZE}"         --line-buffer          --recstart '>'         --cat          'signalp4 -t "euk" -f short "{}"'     < in.fasta     | cat > out.txt
  
  predutils r2js         --pipeline-version "1.2.8-alpha"         --software-version "4.1g"         -o out.ldjson         signalp4 out.txt in.fasta

Command exit status:
  65

Command output:
  (empty)

Command error:
  INFO:    Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred
  INFO:    Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred
  INFO:    Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
  INFO:    gocryptfs not found, will not be able to use gocryptfs
  Segmentation fault (core dumped)
  Segmentation fault (core dumped)
  Segmentation fault (core dumped)
  Segmentation fault (core dumped)
  Segmentation fault (core dumped)
  Segmentation fault (core dumped)
  Segmentation fault (core dumped)
  Segmentation fault
  Segmentation fault (core dumped)
  Segmentation fault (core dumped)
  Segmentation fault (core dumped)
  Segmentation fault (core dumped)
  Segmentation fault (core dumped)
  Segmentation fault (core dumped)
  Segmentation fault (core dumped)
  Segmentation fault (core dumped)
  Failed to parse file <out.txt> at line 8.
  
  The line had the wrong number of columns. Expected 12 but got 11

The only thing that I know changed is the way /tmp/ folders work. They are now managed per-session, using the $TMPDIR variable. But I don't quite see how that could cause such an error.


edit (Jan 09)

Just like the apptainer version 1.2.7, the conda version of 1.2.8-alpha passes all tests on a local installation. I'm going to escalate this issue to our HPC admins for now, assuming it's a local anomaly.


edit (Jan 14)

We're assuming it might be related to Omnipath causing problems with OpenMPI. I"ll keep trying stuff out and update this post when I find something!


edit (Jan 16)

Tried my luck setting OMPI_MCA_mtl=ofi as environment variable as recommended, to no avail. I'm not versed in Nextflow, so I might've done something wrong. I tried two approaches on a forked version of Predector (literally just changed the config files around):

Passing the variable directly in bash:
OMPI_MCA_mtl=ofi nextflow run -profile test -with-conda "/usw/bbe0337/miniconda3/envs/predector" -resume -r 1.2.8-alpha markusHaferkamp/predector

Changing nextflow.config:

profiles{
    test {
        includeConfig "$baseDir/conf/test.config"
        env.OMPI_MCA_mtl = 'ofi'
    }
}

Back to square one it is!

@markusHaferkamp markusHaferkamp added the bug Something isn't working label Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant