Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HelixFold3 #223

Open
wants to merge 156 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
156 commits
Select commit Hold shift + click to select a range
aa9be9f
feat(katana.config): Created file katana.config
nbtm-sh Jul 29, 2024
678b212
feat(katana.config): Added params for PBS queues
nbtm-sh Jul 29, 2024
82466e0
feat(katana.config): Added executor parameter to allow the use Katana…
nbtm-sh Jul 29, 2024
8d2a771
feat(katana.config): Added label configs for pushing to GPU partition
nbtm-sh Jul 29, 2024
1ad4996
Merge pull request #1 from Australian-Structural-Biology-Computing/cr…
nbtm-sh Jul 29, 2024
3c1fb28
feat(run_alphafold2): Added 'gpu_compute' label to the Alphafold process
nbtm-sh Jul 29, 2024
67704b4
Merge pull request #2 from Australian-Structural-Biology-Computing/ad…
nbtm-sh Jul 29, 2024
e8d2abb
feat(run_alphafold2_pred): Added 'gpu_compute' label
nbtm-sh Jul 29, 2024
2021623
Merge pull request #3 from Australian-Structural-Biology-Computing/ad…
nbtm-sh Jul 29, 2024
13dd1cb
revert(run_alphafold2.nf): Removed GPU compute label from pipeline
nbtm-sh Jul 29, 2024
b0d483e
Merge pull request #4 from Australian-Structural-Biology-Computing/ad…
nbtm-sh Jul 29, 2024
2444395
Updated database links
jscgh Jul 29, 2024
4ce2e18
Merge pull request #6 from Australian-Structural-Biology-Computing/da…
jscgh Jul 30, 2024
8b9452a
feat(pf_files): Added testing files
nbtm-sh Jul 30, 2024
32311ba
Merge branch 'unsw-dev' into add-testing-files
nbtm-sh Jul 30, 2024
e676c33
Merge pull request #7 from Australian-Structural-Biology-Computing/ad…
nbtm-sh Jul 30, 2024
0962f91
fix(proteinfold_test.sh): Made path to main.nf rel
nbtm-sh Jul 30, 2024
992d6d1
revert(base.config): Changed executor back to local for testing as cl…
nbtm-sh Jul 30, 2024
32d466c
fix(proteinfold_test.sh): Changed mode to 'split_msa_production'
nbtm-sh Jul 30, 2024
d135dc8
Merge pull request #2 from nf-core/master
ziadbkh Aug 1, 2024
b3140e7
fix(dbs.conf): Updated dbs.conf to work on UNSW infrastructure
nbtm-sh Aug 8, 2024
4047e62
fix(run_alphafold2_msa): Fixed incorrectly named files
nbtm-sh Aug 8, 2024
93513bc
fix(run_alphafold2_pred): Fixed incorrectly named files
nbtm-sh Aug 8, 2024
a007d5a
fix(proteinfold_test.sh): Added singulairty argument
nbtm-sh Aug 8, 2024
232c8c9
fix(samplesheet): Changed sample to a much smaller sample
nbtm-sh Aug 8, 2024
03f2575
fix(samplesheet): Changed sampel to a smaller sample
nbtm-sh Aug 8, 2024
632610b
Merge pull request #8 from Australian-Structural-Biology-Computing/ad…
nbtm-sh Aug 8, 2024
964f5d0
feat(conf/dbs): Added variables for database names, and file names
nbtm-sh Aug 8, 2024
2a79fe4
feat(conf/dbs): Changed config paths to use database variables instea…
nbtm-sh Aug 8, 2024
c218ad2
feat(run_alphafold2): Changed hardcoded paths to use variables and up…
nbtm-sh Aug 8, 2024
edff052
feat(run_alphafold2_msa): Removed hardcoded paths and changed variables
nbtm-sh Aug 8, 2024
ea4459a
feat(run_alphafold2_msa): Added code from run_alphafold2.nf so that t…
nbtm-sh Aug 8, 2024
83ca302
fix(conf/dbs): Changed variable names to have _prefix on the end to a…
nbtm-sh Aug 8, 2024
faba0ab
fix(conf/dbs): Changed existing variables to use new prefix variables
nbtm-sh Aug 8, 2024
04cad9d
feat(nextflow.config): Added new param variables and defaults to the …
nbtm-sh Aug 8, 2024
afeb122
feat(dbs): Made variables global
nbtm-sh Aug 9, 2024
3d615b7
fix(dbs): Changed database directory default
nbtm-sh Aug 16, 2024
cb29256
feat(katana): Temporarily removed PBS job scheduling
nbtm-sh Aug 16, 2024
5829301
fix(run_alphafold2): Fixed copy command to point to the correct direc…
nbtm-sh Aug 16, 2024
485e400
fix(run_alphafold2): Updated paths to point to the correct uniclust d…
nbtm-sh Aug 16, 2024
d008f38
fix(run_alphafold2): Fixed typo
nbtm-sh Aug 16, 2024
a0dbd9c
feat(run_alphafold2): Added symlink for params file
nbtm-sh Aug 16, 2024
598cc26
feat(nextflow): Changed default to use GPU
nbtm-sh Aug 16, 2024
eba4412
feat(nextflow): Included katana config
nbtm-sh Aug 16, 2024
4811d3c
feat(test): Added options to katana tests
nbtm-sh Aug 16, 2024
9212111
revert(nextflow): Changed default GPU to false
nbtm-sh Aug 16, 2024
0795469
revert(nextflow): Changed config back to base config
nbtm-sh Aug 16, 2024
3ade215
modified: conf/dbs.config
jscgh Aug 23, 2024
5fe8e1d
feat(katana): Added katana config
nbtm-sh Sep 5, 2024
9af55b1
feat(style): pushing uncommited changes
nbtm-sh Oct 10, 2024
8ce0598
Merge pull request #9 from Australian-Structural-Biology-Computing/cl…
nbtm-sh Oct 10, 2024
3a13ab7
deleted: null/pipeline_info/ as per https://github.com/Australian-…
jscgh Oct 11, 2024
f8b8f4e
Draft new file: run_helixfold3.nf
jscgh Oct 11, 2024
1d0f413
Merge branch 'unsw-dev' into add-rosettafold-all-atom
jscgh Oct 14, 2024
c993475
Initial draft rosettafold-all-atom.nf
jscgh Oct 14, 2024
4b3bd9e
Cleaned up folders
jscgh Oct 14, 2024
65897c8
added workflows/rosettafold-all-atom.nf first draft
jscgh Oct 16, 2024
f362ae9
Updating main.nf to current master version and adding RFAA lines
jscgh Oct 16, 2024
2299830
Imported subworkflows and fixed formatting errors with RFAA lines
jscgh Oct 18, 2024
4de580a
Adjusted naming to snake_case for compatibility, various minor change…
jscgh Oct 21, 2024
7faa62a
Added schema support for rosetta_fold_all_atom mode and .yaml or .yml…
jscgh Oct 21, 2024
375e330
modified: assets/schema_input.json
jscgh Oct 21, 2024
6fdb111
Updating input to work with .yaml https://github.com/Australian-Stru…
jscgh Oct 21, 2024
f910f1c
Merge branch 'master' into add-rosettafold-all-atom
jscgh Oct 22, 2024
3a9d7f2
Merging
jscgh Oct 22, 2024
2136354
Updated naming scheme with merged changes
jscgh Oct 22, 2024
b0f13c3
modified: modules/local/run_alphafold2.nf
jscgh Oct 22, 2024
903b6a2
For https://github.com/nf-core/proteinfold/issues/197
jscgh Oct 22, 2024
9aa5054
Cleaned up files
jscgh Oct 22, 2024
1f29711
Merge remote-tracking branch 'refs/remotes/origin/add-rosettafold-all…
jscgh Oct 22, 2024
2eae3c1
Ran nf-core schema build
jscgh Oct 22, 2024
1352649
Merged with dev
jscgh Oct 22, 2024
f07c612
Ran nf-core schema build
jscgh Oct 22, 2024
637d67c
Dealing with permissions
jscgh Oct 22, 2024
1dd9f03
Readding directory
jscgh Oct 22, 2024
09c64fa
modified: nextflow_schema.json
jscgh Oct 22, 2024
3dd9367
Removed deprecated "check_max"
jscgh Oct 22, 2024
0fb9735
Aligning input channels for RFAA
jscgh Oct 23, 2024
125a702
Aligning input channels for RFAA
jscgh Oct 23, 2024
ef6f516
Runs through rfaa -profile test and -stub successfully
jscgh Oct 23, 2024
1f6e1dd
modified: modules/local/run_rosettafold_all_atom.nf
jscgh Oct 23, 2024
4b68ed8
Debugging RFAA
jscgh Oct 28, 2024
0eba56d
Debugging RFAA
jscgh Oct 28, 2024
9a15bdf
RFAA now working to produce structures
jscgh Oct 29, 2024
51878df
Modified rfaa output to properly emit PDB file
jscgh Oct 30, 2024
076db5a
Fixed renaming pdb
jscgh Oct 30, 2024
c523139
Pipeline now completes successfully
jscgh Nov 1, 2024
70e8b6b
Cleaned up test configs
jscgh Nov 1, 2024
902ebaf
Built schema as per CONTRIBUTING.md
jscgh Nov 1, 2024
8379c50
Fixed db conflicts
jscgh Nov 1, 2024
7c9cf19
Troubleshooting benchmarks and having jobs queued and run by nextflow
jscgh Nov 4, 2024
7dd2e45
Removed leftover blast-2.2.6 references
jscgh Nov 4, 2024
a7aa7eb
Updated nextflow_schema
jscgh Nov 4, 2024
cbb7841
Katana HPC gpu compute option
jscgh Nov 4, 2024
43f7364
Fixing crashes caused by HPC not being able to reach the online custo…
jscgh Nov 4, 2024
e96e175
Ran nf-core linter
jscgh Nov 4, 2024
0c173e5
deleted: .github/workflows/linting_comment.yml
jscgh Nov 4, 2024
b58be9f
Linting files
jscgh Nov 4, 2024
11a2d9f
Genericised pdb emission
jscgh Nov 5, 2024
450239a
Merge branch 'add-rosettafold-all-atom' into add-helixfold3
jscgh Nov 5, 2024
5820f29
Properly calls the HF3 container but cannot run through process yet
jscgh Nov 5, 2024
1033e36
Updated apptainer image paths
jscgh Nov 6, 2024
b444b65
HF3 is now able to start a run
jscgh Nov 6, 2024
456b9e5
First working version of HF3 See https://github.com/Australian-Struct…
jscgh Nov 6, 2024
2a0c2be
Merged with nf-core/dev
jscgh Nov 11, 2024
d7caf71
Merged with dev
jscgh Nov 11, 2024
87c0606
Schema updates
jscgh Nov 12, 2024
1c95a55
Fixing config lines
jscgh Nov 12, 2024
9a27570
new file: conf/modules_helixfold3.config
jscgh Nov 12, 2024
c5cfeff
Modified katana.config to allow for direct execution of jobs on k095 …
jscgh Nov 12, 2024
738fb0a
Fixing left over merge lines and linting
jscgh Nov 12, 2024
8943df1
Linting modified: .github/workflows/linting_comment.yml
jscgh Nov 12, 2024
77c2ad4
Updated awsfulltest.yml
jscgh Nov 12, 2024
8ebc531
Passes linting
jscgh Nov 12, 2024
c4e8c9c
modified: conf/katana.config
jscgh Nov 12, 2024
ef28950
Fixing file emit for hf3
jscgh Nov 12, 2024
bbe81b9
Emits files including cif properly now
jscgh Nov 12, 2024
9afa058
HF3 and RFAA now working with Katana OnDemand
jscgh Nov 13, 2024
b75cdfc
New branch for aligning the new modules (RFAA & HF3) with the nf-core…
jscgh Nov 13, 2024
ac9daaa
Updated for running with configs
jscgh Nov 14, 2024
0625053
Overhauled helixfold3 db paths to match nf-core methods
jscgh Nov 15, 2024
d3e62eb
Katana config
jscgh Nov 18, 2024
94f47dd
Merged with origin/align-modules-to-nf-core for new HF3 path variables
jscgh Nov 18, 2024
c6d14a2
Updated schema with nf-core pipelines schema build
jscgh Nov 18, 2024
5fa189a
Updated schema with nf-core pipelines schema build
jscgh Nov 19, 2024
2ede145
nf-core pipelines lint passed
jscgh Nov 19, 2024
b040e43
Merge remote-tracking branch 'upstream/dev' into add-helixfold3
jscgh Nov 21, 2024
3c6a8cb
Added Helixfold3 module
jscgh Nov 21, 2024
52b4c2a
Aligned to nf-core dev
jscgh Nov 21, 2024
0b88918
Started updating documentation
jscgh Nov 21, 2024
b115d0f
Added download functionality to prepare_helixfold3_dbs
jscgh Nov 21, 2024
a7e326d
Added variables for downloading hf3 dbs
jscgh Nov 21, 2024
6286170
DBs working
jscgh Nov 21, 2024
f4696d5
Fixed maxit-src
jscgh Nov 21, 2024
ddc3d80
Updated files
jscgh Nov 21, 2024
804a9bb
Aligning files
jscgh Nov 21, 2024
95c6fcf
schema_input.json added backwards compatibility to sequence columns
jscgh Nov 21, 2024
1bbd4e6
Test profiles added to nextflow.config
jscgh Nov 22, 2024
f6308fe
Aligned with dev
jscgh Nov 28, 2024
e3a2bcb
Merge remote-tracking branch 'upstream/dev' into add-helixfold3
jscgh Nov 28, 2024
0a4b030
Adding multiqc to hf3 module
jscgh Nov 28, 2024
30ffcd6
Removed leftover RFAA files and variables
jscgh Nov 28, 2024
6a4f6ac
Working with multiqc enabled
jscgh Nov 29, 2024
1a64519
Passes linting and tests
jscgh Nov 29, 2024
d64b9f1
Aligned with nf-core dev
jscgh Nov 29, 2024
17e40bf
Merge remote-tracking branch 'upstream/dev' into add-helixfold3
jscgh Nov 29, 2024
19bc76f
Updated hf3 definition file
jscgh Dec 2, 2024
9baee7b
Updated hf3 definition file
jscgh Dec 2, 2024
71f0e9e
Added HF3 Dockerfile
jscgh Dec 5, 2024
1b79ea9
Prettier
jscgh Dec 5, 2024
edaaeea
Fixed hf3_environment.yaml missing packages
jscgh Dec 10, 2024
c3440a9
HF3 container path updated
jscgh Dec 10, 2024
66bcafc
HF3 dev path updated
jscgh Dec 10, 2024
a3297a3
Updated HF3 container to nf-core repo
jscgh Dec 10, 2024
f7cd160
Prettier
jscgh Dec 11, 2024
d604cc4
Merge remote-tracking branch 'upstream/dev' into add-helixfold3
jscgh Jan 16, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .github/CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ If you're not used to this workflow with git, you can start with some [docs from
You have the option to test your changes locally by running the pipeline. For receiving warnings about process selectors and other `debug` information, it is recommended to use the debug profile. Execute all the tests with the following command:

```bash
nextflow run . --profile debug,test,docker --outdir <OUTDIR>
nextflow run . -profile debug,test,docker --outdir <OUTDIR>
```

When you create a pull request with changes, [GitHub Actions](https://github.com/features/actions) will run automatic tests.
Expand Down Expand Up @@ -78,8 +78,8 @@ If you wish to contribute a new step, please use the following coding standards:
5. Add any new parameters to `nextflow_schema.json` with help text (via the `nf-core pipelines schema build` tool).
6. Add sanity checks and validation for all relevant parameters.
7. Perform local tests to validate that the new code works as expected.
8. If applicable, add a new test command in `.github/workflow/ci.yml`.
9. Update MultiQC config `assets/multiqc_config.yml` so relevant suffixes, file name clean up and module plots are in the appropriate order. If applicable, add a [MultiQC](https://https://multiqc.info/) module.
8. If applicable, add a new test command in `.github/workflows/ci.yml`.
9. Update MultiQC config `assets/multiqc_config.yml` so relevant suffixes, file name clean up and module plots are in the appropriate order. If applicable, add a [MultiQC](https://multiqc.info/) module.
10. Add a description of the output files and if relevant any appropriate images from the MultiQC report to `docs/output.md`.

### Default values
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ jobs:
- "test_colabfold_download"
- "test_esmfold"
- "test_split_fasta"
- "test_helixfold3"
isMaster:
- ${{ github.base_ref == 'master' }}
# Exclude conda and singularity on dev
Expand Down
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- [[PR ##205](https://github.com/nf-core/proteinfold/pull/205)] - Change input schema from `sequence,fasta` to `id,fasta`.
- [[PR #210](https://github.com/nf-core/proteinfold/pull/210)] - Moving post-processing logic to a subworkflow, change wave images pointing to oras to point to https and refactor module to match nf-core folder structure.
- [[#214](https://github.com/nf-core/proteinfold/issues/214)] - Fix colabfold image to run in cpus after [#188](https://github.com/nf-core/proteinfold/issues/188) fix.
- [[PR ##220](https://github.com/nf-core/proteinfold/pull/220)] - Add RoseTTAFold-All-Atom module.
- [[PR ##223](https://github.com/nf-core/proteinfold/pull/223)] - Add HelixFold3 module.
- [[#235](https://github.com/nf-core/proteinfold/issues/235)] - Update samplesheet to new version (switch from `sequence` column to `id`).

## [[1.1.1](https://github.com/nf-core/proteinfold/releases/tag/1.1.1)] - 2025-07-30
Expand Down Expand Up @@ -106,6 +108,8 @@ Thank you to everyone else that has contributed by reporting bugs, enhancements
| | `--esm2_t36_3B_UR50D_contact_regression` |
| | `--esmfold_params_path` |
| | `--skip_multiqc` |
| | `--rosettafold_all_atom_db` |
| | `--helixfold3_db` |

> **NB:** Parameter has been **updated** if both old and new parameter information is present.
> **NB:** Parameter has been **added** if just the new parameter information is present.
Expand Down
30 changes: 29 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,10 @@ On release, automated continuous integration tests run the pipeline on a full-si

v. [ESMFold](https://github.com/facebookresearch/esm) - Regular ESM

vi. [RoseTTAFold-All-Atom](https://github.com/baker-laboratory/RoseTTAFold-All-Atom/) - Regular RFAA

vii. [HelixFold3](https://github.com/PaddlePaddle/PaddleHelix/tree/dev/apps/protein_folding/helixfold3) - Regular HF3

## Usage

> [!NOTE]
Expand All @@ -53,7 +57,7 @@ nextflow run nf-core/proteinfold \
--outdir <OUTDIR>
```

The pipeline takes care of downloading the databases and parameters required by AlphaFold2, Colabfold or ESMFold. In case you have already downloaded the required files, you can skip this step by providing the path to the databases using the corresponding parameter [`--alphafold2_db`], [`--colabfold_db`] or [`--esmfold_db`]. Please refer to the [usage documentation](https://nf-co.re/proteinfold/usage) to check the directory structure you need to provide for each of the databases.
The pipeline takes care of downloading the databases and parameters required by AlphaFold2, Colabfold or ESMFold. In case you have already downloaded the required files, you can skip this step by providing the path to the databases using the corresponding parameter [`--alphafold2_db`], [`--colabfold_db`], [`--esmfold_db`] or ['--rosettafold_all_atom_db']. Please refer to the [usage documentation](https://nf-co.re/proteinfold/usage) to check the directory structure you need to provide for each of the databases.

- The typical command to run AlphaFold2 mode is shown below:

Expand Down Expand Up @@ -136,6 +140,30 @@ The pipeline takes care of downloading the databases and parameters required by
-profile <docker/singularity/podman/shifter/charliecloud/conda/institute>
```

- The rosettafold_all_atom mode can be run using the command below:

```console
nextflow run nf-core/proteinfold \
--input samplesheet.csv \
--outdir <OUTDIR> \
--mode rosettafold_all_atom \
--rosettafold_all_atom_db <null (default) | PATH> \
--use_gpu <true/false> \
-profile <docker/singularity/podman/shifter/charliecloud/conda/institute>
```

- The helixfold3 mode can be run using the command below:

```console
nextflow run nf-core/proteinfold \
--input samplesheet.csv \
--outdir <OUTDIR> \
--mode helixfold3 \
--helixfold3_db <null (default) | PATH> \
--use_gpu <true/false> \
-profile <docker/singularity/podman/shifter/charliecloud/conda/institute>
```

> [!WARNING]
> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; see [docs](https://nf-co.re/docs/usage/getting_started/configuration#custom-configuration-files).

Expand Down
13 changes: 10 additions & 3 deletions assets/schema_input.json
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,12 @@
"items": {
"type": "object",
"properties": {
"sequence": {
"type": "string",
"pattern": "^\\S+$",
"errorMessage": "Sequence name must be provided and cannot contain spaces",
"meta": ["sequence"]
},
"id": {
"type": "string",
"pattern": "^\\S+$",
Expand All @@ -17,10 +23,11 @@
"type": "string",
"format": "file-path",
"exists": true,
"pattern": "^\\S+\\.fa(sta)?$",
"errorMessage": "Fasta file must be provided, cannot contain spaces and must have extension '.fa' or '.fasta'"
"pattern": "^\\S+\\.(fa(sta)?|yaml|yml|json)$",
"errorMessage": "Fasta, yaml or json file must be provided, cannot contain spaces and must have extension '.fa', '.fasta', '.yaml', '.yml', or '.json'"
}
},
"required": ["id", "fasta"]
"required": ["fasta"],
"anyOf": [{ "required": ["sequence"] }, { "required": ["id"] }]
}
}
1 change: 1 addition & 0 deletions bin/generate_report.py
Original file line number Diff line number Diff line change
Expand Up @@ -307,6 +307,7 @@ def pdb_to_lddt(pdb_files, generate_tsv):
"esmfold": "ESMFold",
"alphafold2": "AlphaFold2",
"colabfold": "ColabFold",
"helixfold3": "HelixFold3",
}

parser = argparse.ArgumentParser()
Expand Down
29 changes: 29 additions & 0 deletions conf/dbs.config
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,35 @@ params {
"alphafold2_ptm" : "alphafold_params_2021-07-14"
]

// Helixfold3 links
helixfold3_uniclust30_link = 'https://storage.googleapis.com/alphafold-databases/casp14_versions/uniclust30_2018_08_hhsuite.tar.gz'
helixfold3_ccd_preprocessed_link = 'https://paddlehelix.bd.bcebos.com/HelixFold3/CCD/ccd_preprocessed_etkdg.pkl.gz'
helixfold3_rfam_link = 'https://paddlehelix.bd.bcebos.com/HelixFold3/MSA/Rfam-14.9_rep_seq.fasta'
helixfold3_init_models_link = 'https://paddlehelix.bd.bcebos.com/HelixFold3/params/HelixFold3-params-240814.zip'
helixfold3_bfd_link = 'https://storage.googleapis.com/alphafold-databases/casp14_versions/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt.tar.gz'
helixfold3_small_bfd_link = 'https://storage.googleapis.com/alphafold-databases/reduced_dbs/bfd-first_non_consensus_sequences.fasta.gz'
helixfold3_uniprot_sprot_link = 'ftp://ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz'
helixfold3_uniprot_trembl_link = 'ftp://ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_trembl.fasta.gz'
helixfold3_pdb_seqres_link = "${params.pdb_seqres_link}"
helixfold3_uniref90_link = 'ftp://ftp.uniprot.org/pub/databases/uniprot/uniref/uniref90/uniref90.fasta.gz'
helixfold3_mgnify_link = 'https://storage.googleapis.com/alphafold-databases/casp14_versions/mgy_clusters_2018_12.fa.gz'
helixfold3_pdb_mmcif_link = 'rsync.rcsb.org::ftp_data/structures/divided/mmCIF/'
helixfold3_pdb_obsolete_link = 'ftp://ftp.wwpdb.org/pub/pdb/data/status/obsolete.dat'

// Helixfold3 paths
helixfold3_uniclust30_path = "${params.helixfold3_db}/uniclust30/*"
helixfold3_ccd_preprocessed_path = "${params.helixfold3_db}/ccd_preprocessed_etkdg.pkl.gz"
helixfold3_rfam_path = "${params.helixfold3_db}/Rfam-14.9_rep_seq.fasta"
helixfold3_init_models_path = "${params.helixfold3_db}/HelixFold3-240814.pdparams"
helixfold3_bfd_path = "${params.helixfold3_db}/bfd/*"
helixfold3_small_bfd_path = "${params.helixfold3_db}/small_bfd/*"
helixfold3_uniprot_path = "${params.helixfold3_db}/uniprot/*"
helixfold3_pdb_seqres_path = "${params.helixfold3_db}/pdb_seqres/*"
helixfold3_uniref90_path = "${params.helixfold3_db}/uniref90/*"
helixfold3_mgnify_path = "${params.helixfold3_db}/mgnify/*"
helixfold3_pdb_mmcif_path = "${params.helixfold3_db}/pdb_mmcif/*"
helixfold3_maxit_src_path = "${params.helixfold3_db}/maxit-v11.200-prod-src"

// Esmfold links
esmfold_3B_v1 = 'https://dl.fbaipublicfiles.com/fair-esm/models/esmfold_3B_v1.pt'
esm2_t36_3B_UR50D = 'https://dl.fbaipublicfiles.com/fair-esm/models/esm2_t36_3B_UR50D.pt'
Expand Down
22 changes: 22 additions & 0 deletions conf/modules_helixfold3.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Config file for defining DSL2 per module options and publishing paths
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Available keys to override module options:
ext.args = Additional arguments appended to command in module.
ext.args2 = Second set of arguments appended to command in module (multi-tool modules).
ext.args3 = Third set of arguments appended to command in module (multi-tool modules).
ext.prefix = File name prefix for output files.
----------------------------------------------------------------------------------------
*/

process {
withName: 'NFCORE_PROTEINFOLD:HELIXFOLD3:MULTIQC' {
publishDir = [
path: { "${params.outdir}/multiqc" },
mode: 'copy',
saveAs: { filename -> filename.equals('versions.yml') ? null : "helixfold3_$filename" }
]
}

}
37 changes: 37 additions & 0 deletions conf/test_helixfold3.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Nextflow config file for running minimal tests
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Defines input files and everything required to run a fast and simple pipeline test.
Use as follows:
nextflow run nf-core/proteinfold -profile test_helixfold3,<docker/singularity> --outdir <OUTDIR>
----------------------------------------------------------------------------------------
*/

stubRun = true

// Limit resources so that this can run on GitHub Actions
process {
resourceLimits = [
cpus: 4,
memory: '15.GB',
time: '1.h'
]
}

params {
config_profile_name = 'Test profile'
config_profile_description = 'Minimal test dataset to check pipeline function'

// Input data to test helixfold3
mode = 'helixfold3'
helixfold3_db = "${projectDir}/assets/dummy_db_dir"
input = params.pipelines_testdata_base_path + 'proteinfold/testdata/samplesheet/v1.0/samplesheet.csv'
}

process {
withName: 'RUN_HELIXFOLD3' {
container = '/srv/scratch/sbf-pipelines/proteinfold/singularity/helixfold3.sif'
}
}

34 changes: 34 additions & 0 deletions dockerfiles/Dockerfile_nfcore-proteinfold_helixfold3
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
FROM nvidia/cuda:12.1.1-cudnn8-devel-ubuntu22.04

LABEL Author="[email protected]" \
title="nfcore/proteinfold_helixfold3" \
Version="0.9.0" \
description="Docker image containing all software requirements to run the RUN_HELIXFOLD3 module using the nf-core/proteinfold pipeline"

ENV PYTHONPATH="/app/helixfold3:$PYTHONPATH" \
PATH="/conda/bin:/app/helixfold3:$PATH" \
PYTHON_BIN="/conda/envs/helixfold/bin/python3.9" \
ENV_BIN="/conda/envs/helixfold/bin" \
OBABEL_BIN="/conda/envs/helixfold/bin"

RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y wget git && \
wget -q -P /tmp "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh" && \
bash /tmp/Miniforge3-$(uname)-$(uname -m).sh -b -p /conda && \
rm -rf /tmp/Miniforge3-$(uname)-$(uname -m).sh /var/lib/apt/lists/* && \
apt-get autoremove -y && apt-get clean -y

RUN git clone --single-branch --branch dev --depth 1 --no-checkout https://github.com/PaddlePaddle/PaddleHelix.git /app/helixfold3 && \
cd /app/helixfold3 && \
git sparse-checkout init --cone && \
git sparse-checkout set apps/protein_folding/helixfold3 && \
git checkout dev && \
mv apps/protein_folding/helixfold3/* . && \
rm -rf apps

COPY hf3_environment.yaml /app/helixfold3/
RUN /conda/bin/mamba env create --file=/app/helixfold3/hf3_environment.yaml && \
/conda/bin/mamba install -y -c bioconda aria2 hmmer==3.3.2 kalign2==2.04 hhsuite==3.3.0 -n helixfold && \
/conda/bin/mamba install -y -c conda-forge openbabel -n helixfold && \
/conda/bin/mamba clean --all --force-pkgs-dirs -y && \
rm -rf /root/.cache && \
apt-get autoremove -y && apt-get remove --purge -y wget git && apt-get clean -y
48 changes: 48 additions & 0 deletions dockerfiles/helixfold3.def
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
Bootstrap: docker
From: nvidia/cuda:12.6.0-cudnn-devel-ubuntu24.04

%labels
Author [email protected]
Version 0.2.1

%files
environment.yaml .

%post
apt update && DEBIAN_FRONTEND=noninteractive apt install --no-install-recommends -y wget git

wget "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh"
bash Miniforge3-Linux-x86_64.sh -b -p /opt/miniforge
rm Miniforge3-Linux-x86_64.sh
export PATH="/opt/miniforge/bin:$PATH"

git clone --single-branch --branch dev --depth 1 --no-checkout https://github.com/PaddlePaddle/PaddleHelix.git app/helixfold3
cd app/helixfold3
git sparse-checkout init --cone
git sparse-checkout set apps/protein_folding/helixfold3
git checkout dev
mv apps/protein_folding/helixfold3/* .
rm -rf apps
mv /environment.yaml .
mamba env create -f environment.yaml

conda install -y -c bioconda aria2 hmmer==3.3.2 kalign2==2.04 hhsuite==3.3.0 -n helixfold
conda install -y -c conda-forge openbabel -n helixfold

mamba run -n helixfold \
'python3 -m pip install paddlepaddle-gpu==2.6.1 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html && \
python3 -m pip install -r requirements.txt'

apt autoremove -y && apt remove --purge -y wget git && apt clean -y
rm -rf /var/lib/apt/lists/* /root/.cache *.tar.gz
mamba clean --all --force-pkgs-dirs -y

%environment
export PATH="/app/helixfold3:/opt/miniforge/bin:$PATH"
export PYTHONPATH="/app/helixfold3:$PYTHONPATH"
export PYTHON_BIN="/opt/miniforge/envs/helixfold/bin/python3.9"
export ENV_BIN="/opt/miniforge/envs/helixfold/bin"
export OBABEL_BIN="/opt/miniforge/envs/helixfold/bin"

%runscript
mamba run --name helixfold "$@"
35 changes: 35 additions & 0 deletions dockerfiles/hf3_environment.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
name: helixfold
channels:
- conda-forge
- bioconda
- nvidia
- biocore

dependencies:
- python=3.9
- cuda-toolkit=12.0
- cudnn=8.4.0
- nccl=2.14
- libgcc
- libgomp
- pip
- aria2
- hmmer==3.4
- kalign2==2.04
- hhsuite==3.3.0
- openbabel
- pip:
- paddlepaddle-gpu==2.6.1 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html
- absl-py==0.13.0
- biopython==1.79
- chex==0.0.7
- dm-haiku==0.0.4
- dm-tree==0.1.6
- docker==5.0.0
- immutabledict==2.0.0
- jax==0.2.14
- ml-collections==0.1.0
- pandas==1.3.4
- scipy==1.9.0
- rdkit-pypi==2022.9.5
- posebusters
14 changes: 14 additions & 0 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and predicts pr
- [AlphaFold2](https://github.com/deepmind/alphafold)
- [ColabFold](https://github.com/sokrypton/ColabFold) - MMseqs2 (API server or local search) followed by ColabFold
- [ESMFold](https://github.com/facebookresearch/esm)
- [RoseTTAFold-All-Atom](https://github.com/baker-laboratory/RoseTTAFold-All-Atom/)
- [HelixFold3](https://github.com/PaddlePaddle/PaddleHelix/tree/dev/apps/protein_folding/helixfold3)

See main [README.md](https://github.com/nf-core/proteinfold/blob/master/README.md) for a condensed overview of the steps in the pipeline, and the bioinformatics tools used at each step.

Expand Down Expand Up @@ -176,6 +178,18 @@ Below you can find an indicative example of the TSV file with the pLDDT scores p
| 49 | CB | VAL | 7 | 52.74 |
| 50 | O | VAL | 7 | 56.46 |

### HelixFold3

<details markdown="1">
<summary>Output files</summary>

- `run/`
- `<SEQUENCE NAME>_helixfold3.pdb` that is the structure with the highest pLDDT score (ranked first)
- `<SEQUENCE NAME>_plddt_mqc.tsv` that presents the pLDDT scores per residue for the predicted model
- `<SEQUENCE NAME>/` that contains the computed MSAs, prediction metadata, ranked structures, raw model outputs etc.

</details>

### MultiQC report

<details markdown="1">
Expand Down
12 changes: 12 additions & 0 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -426,6 +426,18 @@ If you specify the `--esmfold_db <PATH>` parameter, the directory structure of y
└── esmfold_3B_v1.pt
```

HelixFold3 can be run using this command (note that HF3 requires .json files not .fasta):

```console
nextflow run nf-core/proteinfold \
--input samplesheet.csv \
--outdir <OUTDIR> \
--mode helixfold3 \
--helixfold3_db <null (default) | DB_PATH> \
--use_gpu <true/false> \
-profile <docker>
```

This will launch the pipeline with the `docker` configuration profile. See below for more information about profiles.

Note that the pipeline will create the following files in your working directory:
Expand Down
Loading
Loading