Skip to content

Commit

Permalink
Updated some documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
darcyabjones committed Aug 27, 2024
1 parent f9052c8 commit b3e4700
Show file tree
Hide file tree
Showing 6 changed files with 39 additions and 39 deletions.
22 changes: 22 additions & 0 deletions docs/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,28 @@ _nf_script_9f2a833e: 9: unable to resolve class download_pfam_dat
Please update Nextflow to a more recent version (>21) to resolve this issue.


### On a network file system receive an error `open: can't stat file`.

This is usually caused by a delay in how network file systems sync between many clients.
If something is writing or reading files very quickly a program can have problems retrieving a file.
If the server hasn't finished uploading and placing the file, then it can tell your program that the file doesn't exist.
In SignalP3 this can potentially even cause segfaults (see: https://github.com/ccdmb/predector/issues/92#issuecomment-2034991440).

A solution to this is to use a local filesystem to do the actual work.
On an HPC you'll usually have some space in `/tmp` to do some work, and you can tell nextflow to stage files there with the following config.

```
process {
scratch = '/tmp'
}
```

If you save this to a file, say "my_scratch.config", you can then supply that as an extra argument to nextflow.
E.g. `nextflow run -c my_scratch.config -profile test,docker ccdmb/predector`

Thanks to @ibebio for pointing us towards this problem.


### Running/setting up conda environment: `loadable library and perl binaries are mismatched (got handshake key 0xdb80080, needed 0xde00080)`

This will usually happen if the operating system you're running on has some perl libraries in the search path for a different version of perl.
Expand Down
20 changes: 10 additions & 10 deletions docs/install.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,11 +68,11 @@ Where you have a choice between versions for different operating systems, you sh
- [SignalP](https://services.healthtech.dtu.dk/services/SignalP-3.0/9-Downloads.php#) version 3.0
- [SignalP](https://services.healthtech.dtu.dk/services/SignalP-4.1/9-Downloads.php#) version 4.1g
- [SignalP](https://services.healthtech.dtu.dk/services/SignalP-5.0/9-Downloads.php#) version 5.0b
- [SignalP](https://services.healthtech.dtu.dk/services/SignalP-6.0/9-Downloads.php#) version 6.0g "fast" **\*currently optional**
- [SignalP](https://services.healthtech.dtu.dk/services/SignalP-6.0/9-Downloads.php#) version 6.0h "fast" **\*currently optional**
- [TargetP](https://services.healthtech.dtu.dk/services/TargetP-2.0/9-Downloads.php#) version 2.0
- [DeepLoc](https://services.healthtech.dtu.dk/services/DeepLoc-1.0/9-Downloads.php#) version 1.0
- [TMHMM](https://services.healthtech.dtu.dk/services/TMHMM-2.0/9-Downloads.php#) version 2.0c
- [Phobius](http://software.sbc.su.se/cgi-bin/request.cgi?project=phobius) version 1.01
- [Phobius](https://software.sbc.su.se/phobius.html) version 1.01

Note that DTU (SignalP etc) don't keep older patches and minor versions available.
If the specified version isn't available to download, another version with the same major number _should_ be fine.
Expand Down Expand Up @@ -102,11 +102,11 @@ curl -s "https://raw.githubusercontent.com/ccdmb/predector/1.2.7/install.sh" \
-3 signalp-3.0.Linux.tar.Z \
-4 signalp-4.1g.Linux.tar.gz \
-5 signalp-5.0b.Linux.tar.gz \
-6 signalp-6.0g.fast.tar.gz \
-6 signalp-6.0h.fast.tar.gz \
-t targetp-2.0.Linux.tar.gz \
-d deeploc-1.0.All.tar.gz \
-m tmhmm-2.0c.Linux.tar.gz \
-p phobius101_linux.tar.gz
-p phobius101_linux.tgz
```

This will create the conda environment (named `predector`), or the docker (tagged `predector/predector:1.2.7`) or singularity (file `./predector.sif`) containers.
Expand Down Expand Up @@ -219,10 +219,10 @@ Modify the source tar archive filenames in the commands if necessary.
signalp3-register signalp-3.0.Linux.tar.Z \
&& signalp4-register signalp-4.1g.Linux.tar.gz \
&& signalp5-register signalp-5.0b.Linux.tar.gz \
&& signalp6-register signalp-6.0g.fast.tar.gz \
&& signalp6-register signalp-6.0h.fast.tar.gz \
&& targetp2-register targetp-2.0.Linux.tar.gz \
&& deeploc-register deeploc-1.0.All.tar.gz \
&& phobius-register phobius101_linux.tar.gz \
&& phobius-register phobius101_linux.tgz \
&& tmhmm2-register tmhmm-2.0c.Linux.tar.gz
```

Expand All @@ -243,9 +243,9 @@ curl -s https://raw.githubusercontent.com/ccdmb/predector/1.2.7/Dockerfile \
--build-arg SIGNALP3=signalp-3.0.Linux.tar.Z \
--build-arg SIGNALP4=signalp-4.1g.Linux.tar.gz \
--build-arg SIGNALP5=signalp-5.0b.Linux.tar.gz \
--build-arg SIGNALP6=signalp-6.0g.fast.tar.gz \
--build-arg SIGNALP6=signalp-6.0h.fast.tar.gz \
--build-arg TARGETP2=targetp-2.0.Linux.tar.gz \
--build-arg PHOBIUS=phobius101_linux.tar.gz \
--build-arg PHOBIUS=phobius101_linux.tgz \
--build-arg TMHMM=tmhmm-2.0c.Linux.tar.gz \
--build-arg DEEPLOC=deeploc-1.0.All.tar.gz \
-t predector/predector:1.2.7 \
Expand All @@ -272,9 +272,9 @@ Modify the source tar archive filenames if necessary.
export SIGNALP3=signalp-3.0.Linux.tar.Z
export SIGNALP4=signalp-4.1g.Linux.tar.gz
export SIGNALP5=signalp-5.0b.Linux.tar.gz
export SIGNALP6=signalp-6.0g.fast.tar.gz
export SIGNALP6=signalp-6.0h.fast.tar.gz
export TARGETP2=targetp-2.0.Linux.tar.gz
export PHOBIUS=phobius101_linux.tar.gz
export PHOBIUS=phobius101_linux.tgz
export TMHMM=tmhmm-2.0c.Linux.tar.gz
export DEEPLOC=deeploc-1.0.All.tar.gz

Expand Down
3 changes: 0 additions & 3 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,9 +60,6 @@ There are a lot of columns, though generally you'll only be interested in a few
| `has_phibase_effector_match` | Boolean [0, 1] | Indicates whether the protein had a significant hit to one of the phibase phenotypes: Effector, Hypervirulence, or loss of pathogenicity |
| `has_phibase_virulence_match` | Boolean [0, 1] | Indicating whether the protein had a significant hit with the phenotype "reduced virulence" |
| `has_phibase_lethal_match` | Boolean [0, 1] | Indicating whether the protein had a significant hit with the phenotype "lethal" |
| `pfam_ids` | List | A comma separated list of all Pfam HMM ids matched | You can find details on Pfam match entries at http://pfam.xfam.org (use the "Jump to" search boxes with this ID). Matches are sorted by evalue, so the first hit is the best.|
| `pfam_names` | List | A comma separated list of all Pfam HMM names matched | Matches are sorted by evalue, so the first hit is the best. |
| `has_pfam_virulence_match` | Boolean [0, 1] | Indicating whether the protein had a significant hit to one of the selected Pfam HMMs associated with virulence function | A list of virulence associated Pfam entries is here: https://github.com/ccdmb/predector/blob/master/data/pfam_targets.txt |
| `dbcan_matches` | List | A comma separated list of all dbCAN matches | You can find details on CAZYme families at http://www.cazy.org/. For more on dbCAN specifically see here https://bcb.unl.edu/dbCAN2/. Matches are sorted by evalue, so the first hit is the best. |
| `has_dbcan_virulence_match` | Boolean [0, 1] | Indicating whether the protein had a significant hit to one of the dbCAN domains associated with virulence function | A list of virulence associated dbCAN entries is here: https://github.com/ccdmb/predector/blob/master/data/dbcan_targets.txt |
| `effectorp1` | Float | The raw EffectorP v1 prediction pseudo-probability | Values above 0.5 are considered to be effector predictions |
Expand Down
27 changes: 4 additions & 23 deletions docs/running.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,14 +76,6 @@ Important parameters are:
--phibase <path>
Path to the PHI-base fasta dataset.
--pfam_hmm <path>
Path to already downloaded gzipped pfam HMM database
default: download the hmms
--pfam_dat <path>
Path to already downloaded gzipped pfam DAT database
default: download the DAT file
--dbcan <path>
Path to already downloaded gzipped dbCAN HMM database
default: download the hmms
Expand Down Expand Up @@ -133,12 +125,6 @@ Important parameters are:
for documenting what was run.
THIS OPTION WILL BE REMOVED IN A FUTURE RELEASE.
--no_pfam
Don't download and/or run Pfam and Pfamscan. Downloading Pfam is quite slow,
even though it isn't particularly big. Sometimes the servers are down too.
You might also run your proteomes through something like interproscan, in which
case you might not need these results. This means you can keep going without it.
--no_dbcan
Don't download and/or run searches against the dbCAN CAZyme dataset.
If you're doing this analysis elsewhere, the dbCAN2 servers are down,
Expand Down Expand Up @@ -220,7 +206,7 @@ Those starting with two hyphens `--` are Predector defined parameters.
In the pipeline ranking output tables we also provide a manual (i.e. not machine learning) ranking score for both effectors `manual_effector_score` and secretion `manual_secretion_score`.
This was provided so that you could customise the ranking if the ML ranker isn't what you want.

> NOTE: If you decide not to run specific analyses (e.g. signalp6 or Pfam), this may affect comparability between different runs of the pipeline.
> NOTE: If you decide not to run specific analyses (e.g. signalp6), this may affect comparability between different runs of the pipeline.
These scores are computed by a relatively simple linear function weighting features in the ranking table.
You can customise the weights applied to the features from the command line.
Expand All @@ -239,7 +225,6 @@ It is composed of four other columns like this:
has_effector_match = has_phibase_effector_match
or (effector_matches != '.')
or has_dbcan_virulence_match
or has_pfam_virulence_match
```


Expand Down Expand Up @@ -383,7 +368,6 @@ In the config files, you can select these tasks by label.
| software | `deepredeff` | |
| software | `emboss` | |
| software | `hmmer3` | |
| software | `pfamscan` | |
| software | `mmseqs` | |


Expand Down Expand Up @@ -416,9 +400,9 @@ If you get an error about missing git tags when running either of the first two
I suggest keeping copies of the proprietary dependencies handy in a folder or archive, and just building and removing the container/environment as you need it.


### Providing pre-downloaded Pfam, PHI-base, and dbCAN datasets.
### Providing pre-downloaded PHI-base, and dbCAN datasets.

Sometimes the Pfam or dbCAN servers can be a bit slow for downloads, and are occasionally unavailable which will cause the pipeline to fail.
Sometimes the dbCAN servers can be a bit slow for downloads, and are occasionally unavailable which will cause the pipeline to fail.
You may want to keep the downloaded databases to reuse them (or pre-download them).

If you've already run the pipeline once, they'll be in the `results` folder (unless you specified `--outdir`) so you can do:
Expand All @@ -429,17 +413,14 @@ nextflow run \
-profile test \
-resume ccdmb/predector \
--phibase phi-base_current.fas \
--pfam_hmm downloads/Pfam-A.hmm.gz \
--pfam_dat downloads/Pfam-A.hmm.dat.gz \
--dbcan downloads/dbCAN.txt \
--effectordb downloads/effectordb.hmm.gz
```

This will skip the download step at the beginning and just use those files, which saves a few minutes.

You can also download the files from:
- http://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/ `Pfam-A.hmm.gz` and `Pfam-A.hmm.dat.gz`
- https://bcb.unl.edu/dbCAN2/download/ `dbCAN-HMMdb-V10.txt`
- https://bcb.unl.edu/dbCAN2/download/ `dbCAN-HMMdb-V13.txt`
- http://www.phi-base.org/downloadLink.htm OR https://github.com/PHI-base/data/tree/master/releases (only need the `.fas` fasta file).
- https://doi.org/10.6084/m9.figshare.16973665 `effectordb.hmm.gz`

Expand Down
2 changes: 1 addition & 1 deletion environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ dependencies:
- predector::effectorp2=2.0
- predector::effectorp3=3.0=py_2
- predector::localizer=1.0.4
- predector::phobius=1.01=4
- predector::phobius=1.01=5
- predector::predectorutils=0.9.1
- predector::signalp3=3.0b=3
- predector::signalp4=4.1g=3
Expand Down
4 changes: 2 additions & 2 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -65,8 +65,8 @@ params {
private_phibase_version = "v4-17"
// The PHI-base team often rename files. Best to get a release from a commit rather than master.
private_phibase_url = "https://github.com/PHI-base/data/blob/38c9034d754482986f2f9f73b3b46d5bb7da1615/releases/phi-base_v4-17_2024-05-01.fas"
private_dbcan_version = "V12"
private_dbcan_url = "https://bcb.unl.edu/dbCAN2/download/dbCAN-HMMdb-V12.txt"
private_dbcan_version = "V13"
private_dbcan_url = "https://bcb.unl.edu/dbCAN2/download/dbCAN-HMMdb-V13.txt"
private_effectordb_url = "https://figshare.com/ndownloader/files/31397770"
private_effectordb_version = "1"
}
Expand Down

0 comments on commit b3e4700

Please sign in to comment.