diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md index b9cee267..9fe27540 100644 --- a/.github/CONTRIBUTING.md +++ b/.github/CONTRIBUTING.md @@ -29,7 +29,7 @@ If you're not used to this workflow with git, you can start with some [docs from You have the option to test your changes locally by running the pipeline. For receiving warnings about process selectors and other `debug` information, it is recommended to use the debug profile. Execute all the tests with the following command: ```bash -nextflow run . --profile debug,test,docker --outdir +nextflow run . -profile debug,test,docker --outdir ``` When you create a pull request with changes, [GitHub Actions](https://github.com/features/actions) will run automatic tests. @@ -78,8 +78,8 @@ If you wish to contribute a new step, please use the following coding standards: 5. Add any new parameters to `nextflow_schema.json` with help text (via the `nf-core pipelines schema build` tool). 6. Add sanity checks and validation for all relevant parameters. 7. Perform local tests to validate that the new code works as expected. -8. If applicable, add a new test command in `.github/workflow/ci.yml`. -9. Update MultiQC config `assets/multiqc_config.yml` so relevant suffixes, file name clean up and module plots are in the appropriate order. If applicable, add a [MultiQC](https://https://multiqc.info/) module. +8. If applicable, add a new test command in `.github/workflows/ci.yml`. +9. Update MultiQC config `assets/multiqc_config.yml` so relevant suffixes, file name clean up and module plots are in the appropriate order. If applicable, add a [MultiQC](https://multiqc.info/) module. 10. Add a description of the output files and if relevant any appropriate images from the MultiQC report to `docs/output.md`. ### Default values diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 49c214e0..4a6b8299 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -44,6 +44,7 @@ jobs: - "test_colabfold_download" - "test_esmfold" - "test_split_fasta" + - "test_helixfold3" isMaster: - ${{ github.base_ref == 'master' }} # Exclude conda and singularity on dev diff --git a/CHANGELOG.md b/CHANGELOG.md index dc14ad8d..62d2783c 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -15,6 +15,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - [[PR ##205](https://github.com/nf-core/proteinfold/pull/205)] - Change input schema from `sequence,fasta` to `id,fasta`. - [[PR #210](https://github.com/nf-core/proteinfold/pull/210)] - Moving post-processing logic to a subworkflow, change wave images pointing to oras to point to https and refactor module to match nf-core folder structure. - [[#214](https://github.com/nf-core/proteinfold/issues/214)] - Fix colabfold image to run in cpus after [#188](https://github.com/nf-core/proteinfold/issues/188) fix. +- [[PR ##220](https://github.com/nf-core/proteinfold/pull/220)] - Add RoseTTAFold-All-Atom module. +- [[PR ##223](https://github.com/nf-core/proteinfold/pull/223)] - Add HelixFold3 module. - [[#235](https://github.com/nf-core/proteinfold/issues/235)] - Update samplesheet to new version (switch from `sequence` column to `id`). ## [[1.1.1](https://github.com/nf-core/proteinfold/releases/tag/1.1.1)] - 2025-07-30 @@ -106,6 +108,8 @@ Thank you to everyone else that has contributed by reporting bugs, enhancements | | `--esm2_t36_3B_UR50D_contact_regression` | | | `--esmfold_params_path` | | | `--skip_multiqc` | +| | `--rosettafold_all_atom_db` | +| | `--helixfold3_db` | > **NB:** Parameter has been **updated** if both old and new parameter information is present. > **NB:** Parameter has been **added** if just the new parameter information is present. diff --git a/README.md b/README.md index b032adf5..8853938f 100644 --- a/README.md +++ b/README.md @@ -39,6 +39,10 @@ On release, automated continuous integration tests run the pipeline on a full-si v. [ESMFold](https://github.com/facebookresearch/esm) - Regular ESM + vi. [RoseTTAFold-All-Atom](https://github.com/baker-laboratory/RoseTTAFold-All-Atom/) - Regular RFAA + + vii. [HelixFold3](https://github.com/PaddlePaddle/PaddleHelix/tree/dev/apps/protein_folding/helixfold3) - Regular HF3 + ## Usage > [!NOTE] @@ -53,7 +57,7 @@ nextflow run nf-core/proteinfold \ --outdir ``` -The pipeline takes care of downloading the databases and parameters required by AlphaFold2, Colabfold or ESMFold. In case you have already downloaded the required files, you can skip this step by providing the path to the databases using the corresponding parameter [`--alphafold2_db`], [`--colabfold_db`] or [`--esmfold_db`]. Please refer to the [usage documentation](https://nf-co.re/proteinfold/usage) to check the directory structure you need to provide for each of the databases. +The pipeline takes care of downloading the databases and parameters required by AlphaFold2, Colabfold or ESMFold. In case you have already downloaded the required files, you can skip this step by providing the path to the databases using the corresponding parameter [`--alphafold2_db`], [`--colabfold_db`], [`--esmfold_db`] or ['--rosettafold_all_atom_db']. Please refer to the [usage documentation](https://nf-co.re/proteinfold/usage) to check the directory structure you need to provide for each of the databases. - The typical command to run AlphaFold2 mode is shown below: @@ -136,6 +140,30 @@ The pipeline takes care of downloading the databases and parameters required by -profile ``` +- The rosettafold_all_atom mode can be run using the command below: + + ```console + nextflow run nf-core/proteinfold \ + --input samplesheet.csv \ + --outdir \ + --mode rosettafold_all_atom \ + --rosettafold_all_atom_db \ + --use_gpu \ + -profile + ``` + +- The helixfold3 mode can be run using the command below: + + ```console + nextflow run nf-core/proteinfold \ + --input samplesheet.csv \ + --outdir \ + --mode helixfold3 \ + --helixfold3_db \ + --use_gpu \ + -profile + ``` + > [!WARNING] > Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; see [docs](https://nf-co.re/docs/usage/getting_started/configuration#custom-configuration-files). diff --git a/assets/schema_input.json b/assets/schema_input.json index 133802ac..e4039a4d 100644 --- a/assets/schema_input.json +++ b/assets/schema_input.json @@ -7,6 +7,12 @@ "items": { "type": "object", "properties": { + "sequence": { + "type": "string", + "pattern": "^\\S+$", + "errorMessage": "Sequence name must be provided and cannot contain spaces", + "meta": ["sequence"] + }, "id": { "type": "string", "pattern": "^\\S+$", @@ -17,10 +23,11 @@ "type": "string", "format": "file-path", "exists": true, - "pattern": "^\\S+\\.fa(sta)?$", - "errorMessage": "Fasta file must be provided, cannot contain spaces and must have extension '.fa' or '.fasta'" + "pattern": "^\\S+\\.(fa(sta)?|yaml|yml|json)$", + "errorMessage": "Fasta, yaml or json file must be provided, cannot contain spaces and must have extension '.fa', '.fasta', '.yaml', '.yml', or '.json'" } }, - "required": ["id", "fasta"] + "required": ["fasta"], + "anyOf": [{ "required": ["sequence"] }, { "required": ["id"] }] } } diff --git a/bin/generate_report.py b/bin/generate_report.py index 93fad4a6..9bfe3173 100755 --- a/bin/generate_report.py +++ b/bin/generate_report.py @@ -307,6 +307,7 @@ def pdb_to_lddt(pdb_files, generate_tsv): "esmfold": "ESMFold", "alphafold2": "AlphaFold2", "colabfold": "ColabFold", + "helixfold3": "HelixFold3", } parser = argparse.ArgumentParser() diff --git a/conf/dbs.config b/conf/dbs.config index d4e521a2..eded8c0c 100644 --- a/conf/dbs.config +++ b/conf/dbs.config @@ -48,6 +48,35 @@ params { "alphafold2_ptm" : "alphafold_params_2021-07-14" ] + // Helixfold3 links + helixfold3_uniclust30_link = 'https://storage.googleapis.com/alphafold-databases/casp14_versions/uniclust30_2018_08_hhsuite.tar.gz' + helixfold3_ccd_preprocessed_link = 'https://paddlehelix.bd.bcebos.com/HelixFold3/CCD/ccd_preprocessed_etkdg.pkl.gz' + helixfold3_rfam_link = 'https://paddlehelix.bd.bcebos.com/HelixFold3/MSA/Rfam-14.9_rep_seq.fasta' + helixfold3_init_models_link = 'https://paddlehelix.bd.bcebos.com/HelixFold3/params/HelixFold3-params-240814.zip' + helixfold3_bfd_link = 'https://storage.googleapis.com/alphafold-databases/casp14_versions/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt.tar.gz' + helixfold3_small_bfd_link = 'https://storage.googleapis.com/alphafold-databases/reduced_dbs/bfd-first_non_consensus_sequences.fasta.gz' + helixfold3_uniprot_sprot_link = 'ftp://ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz' + helixfold3_uniprot_trembl_link = 'ftp://ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_trembl.fasta.gz' + helixfold3_pdb_seqres_link = "${params.pdb_seqres_link}" + helixfold3_uniref90_link = 'ftp://ftp.uniprot.org/pub/databases/uniprot/uniref/uniref90/uniref90.fasta.gz' + helixfold3_mgnify_link = 'https://storage.googleapis.com/alphafold-databases/casp14_versions/mgy_clusters_2018_12.fa.gz' + helixfold3_pdb_mmcif_link = 'rsync.rcsb.org::ftp_data/structures/divided/mmCIF/' + helixfold3_pdb_obsolete_link = 'ftp://ftp.wwpdb.org/pub/pdb/data/status/obsolete.dat' + + // Helixfold3 paths + helixfold3_uniclust30_path = "${params.helixfold3_db}/uniclust30/*" + helixfold3_ccd_preprocessed_path = "${params.helixfold3_db}/ccd_preprocessed_etkdg.pkl.gz" + helixfold3_rfam_path = "${params.helixfold3_db}/Rfam-14.9_rep_seq.fasta" + helixfold3_init_models_path = "${params.helixfold3_db}/HelixFold3-240814.pdparams" + helixfold3_bfd_path = "${params.helixfold3_db}/bfd/*" + helixfold3_small_bfd_path = "${params.helixfold3_db}/small_bfd/*" + helixfold3_uniprot_path = "${params.helixfold3_db}/uniprot/*" + helixfold3_pdb_seqres_path = "${params.helixfold3_db}/pdb_seqres/*" + helixfold3_uniref90_path = "${params.helixfold3_db}/uniref90/*" + helixfold3_mgnify_path = "${params.helixfold3_db}/mgnify/*" + helixfold3_pdb_mmcif_path = "${params.helixfold3_db}/pdb_mmcif/*" + helixfold3_maxit_src_path = "${params.helixfold3_db}/maxit-v11.200-prod-src" + // Esmfold links esmfold_3B_v1 = 'https://dl.fbaipublicfiles.com/fair-esm/models/esmfold_3B_v1.pt' esm2_t36_3B_UR50D = 'https://dl.fbaipublicfiles.com/fair-esm/models/esm2_t36_3B_UR50D.pt' diff --git a/conf/modules_helixfold3.config b/conf/modules_helixfold3.config new file mode 100644 index 00000000..1f1c3f81 --- /dev/null +++ b/conf/modules_helixfold3.config @@ -0,0 +1,22 @@ +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Config file for defining DSL2 per module options and publishing paths +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Available keys to override module options: + ext.args = Additional arguments appended to command in module. + ext.args2 = Second set of arguments appended to command in module (multi-tool modules). + ext.args3 = Third set of arguments appended to command in module (multi-tool modules). + ext.prefix = File name prefix for output files. +---------------------------------------------------------------------------------------- +*/ + +process { + withName: 'NFCORE_PROTEINFOLD:HELIXFOLD3:MULTIQC' { + publishDir = [ + path: { "${params.outdir}/multiqc" }, + mode: 'copy', + saveAs: { filename -> filename.equals('versions.yml') ? null : "helixfold3_$filename" } + ] + } + +} diff --git a/conf/test_helixfold3.config b/conf/test_helixfold3.config new file mode 100644 index 00000000..d08468b8 --- /dev/null +++ b/conf/test_helixfold3.config @@ -0,0 +1,37 @@ +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Nextflow config file for running minimal tests +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Defines input files and everything required to run a fast and simple pipeline test. + Use as follows: + nextflow run nf-core/proteinfold -profile test_helixfold3, --outdir +---------------------------------------------------------------------------------------- +*/ + +stubRun = true + +// Limit resources so that this can run on GitHub Actions +process { + resourceLimits = [ + cpus: 4, + memory: '15.GB', + time: '1.h' + ] +} + +params { + config_profile_name = 'Test profile' + config_profile_description = 'Minimal test dataset to check pipeline function' + + // Input data to test helixfold3 + mode = 'helixfold3' + helixfold3_db = "${projectDir}/assets/dummy_db_dir" + input = params.pipelines_testdata_base_path + 'proteinfold/testdata/samplesheet/v1.0/samplesheet.csv' +} + +process { + withName: 'RUN_HELIXFOLD3' { + container = '/srv/scratch/sbf-pipelines/proteinfold/singularity/helixfold3.sif' + } +} + diff --git a/dockerfiles/Dockerfile_nfcore-proteinfold_helixfold3 b/dockerfiles/Dockerfile_nfcore-proteinfold_helixfold3 new file mode 100644 index 00000000..c6cd9608 --- /dev/null +++ b/dockerfiles/Dockerfile_nfcore-proteinfold_helixfold3 @@ -0,0 +1,34 @@ +FROM nvidia/cuda:12.1.1-cudnn8-devel-ubuntu22.04 + +LABEL Author="j.caley@unsw.edu.au" \ + title="nfcore/proteinfold_helixfold3" \ + Version="0.9.0" \ + description="Docker image containing all software requirements to run the RUN_HELIXFOLD3 module using the nf-core/proteinfold pipeline" + +ENV PYTHONPATH="/app/helixfold3:$PYTHONPATH" \ + PATH="/conda/bin:/app/helixfold3:$PATH" \ + PYTHON_BIN="/conda/envs/helixfold/bin/python3.9" \ + ENV_BIN="/conda/envs/helixfold/bin" \ + OBABEL_BIN="/conda/envs/helixfold/bin" + +RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y wget git && \ + wget -q -P /tmp "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh" && \ + bash /tmp/Miniforge3-$(uname)-$(uname -m).sh -b -p /conda && \ + rm -rf /tmp/Miniforge3-$(uname)-$(uname -m).sh /var/lib/apt/lists/* && \ + apt-get autoremove -y && apt-get clean -y + +RUN git clone --single-branch --branch dev --depth 1 --no-checkout https://github.com/PaddlePaddle/PaddleHelix.git /app/helixfold3 && \ + cd /app/helixfold3 && \ + git sparse-checkout init --cone && \ + git sparse-checkout set apps/protein_folding/helixfold3 && \ + git checkout dev && \ + mv apps/protein_folding/helixfold3/* . && \ + rm -rf apps + +COPY hf3_environment.yaml /app/helixfold3/ +RUN /conda/bin/mamba env create --file=/app/helixfold3/hf3_environment.yaml && \ + /conda/bin/mamba install -y -c bioconda aria2 hmmer==3.3.2 kalign2==2.04 hhsuite==3.3.0 -n helixfold && \ + /conda/bin/mamba install -y -c conda-forge openbabel -n helixfold && \ + /conda/bin/mamba clean --all --force-pkgs-dirs -y && \ + rm -rf /root/.cache && \ + apt-get autoremove -y && apt-get remove --purge -y wget git && apt-get clean -y diff --git a/dockerfiles/helixfold3.def b/dockerfiles/helixfold3.def new file mode 100644 index 00000000..5e0eb7db --- /dev/null +++ b/dockerfiles/helixfold3.def @@ -0,0 +1,48 @@ +Bootstrap: docker +From: nvidia/cuda:12.6.0-cudnn-devel-ubuntu24.04 + +%labels + Author j.caley@unsw.edu.au + Version 0.2.1 + +%files + environment.yaml . + +%post + apt update && DEBIAN_FRONTEND=noninteractive apt install --no-install-recommends -y wget git + + wget "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh" + bash Miniforge3-Linux-x86_64.sh -b -p /opt/miniforge + rm Miniforge3-Linux-x86_64.sh + export PATH="/opt/miniforge/bin:$PATH" + + git clone --single-branch --branch dev --depth 1 --no-checkout https://github.com/PaddlePaddle/PaddleHelix.git app/helixfold3 + cd app/helixfold3 + git sparse-checkout init --cone + git sparse-checkout set apps/protein_folding/helixfold3 + git checkout dev + mv apps/protein_folding/helixfold3/* . + rm -rf apps + mv /environment.yaml . + mamba env create -f environment.yaml + + conda install -y -c bioconda aria2 hmmer==3.3.2 kalign2==2.04 hhsuite==3.3.0 -n helixfold + conda install -y -c conda-forge openbabel -n helixfold + + mamba run -n helixfold \ + 'python3 -m pip install paddlepaddle-gpu==2.6.1 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html && \ + python3 -m pip install -r requirements.txt' + + apt autoremove -y && apt remove --purge -y wget git && apt clean -y + rm -rf /var/lib/apt/lists/* /root/.cache *.tar.gz + mamba clean --all --force-pkgs-dirs -y + +%environment + export PATH="/app/helixfold3:/opt/miniforge/bin:$PATH" + export PYTHONPATH="/app/helixfold3:$PYTHONPATH" + export PYTHON_BIN="/opt/miniforge/envs/helixfold/bin/python3.9" + export ENV_BIN="/opt/miniforge/envs/helixfold/bin" + export OBABEL_BIN="/opt/miniforge/envs/helixfold/bin" + +%runscript + mamba run --name helixfold "$@" diff --git a/dockerfiles/hf3_environment.yaml b/dockerfiles/hf3_environment.yaml new file mode 100644 index 00000000..e277fcef --- /dev/null +++ b/dockerfiles/hf3_environment.yaml @@ -0,0 +1,35 @@ +name: helixfold +channels: + - conda-forge + - bioconda + - nvidia + - biocore + +dependencies: + - python=3.9 + - cuda-toolkit=12.0 + - cudnn=8.4.0 + - nccl=2.14 + - libgcc + - libgomp + - pip + - aria2 + - hmmer==3.4 + - kalign2==2.04 + - hhsuite==3.3.0 + - openbabel + - pip: + - paddlepaddle-gpu==2.6.1 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html + - absl-py==0.13.0 + - biopython==1.79 + - chex==0.0.7 + - dm-haiku==0.0.4 + - dm-tree==0.1.6 + - docker==5.0.0 + - immutabledict==2.0.0 + - jax==0.2.14 + - ml-collections==0.1.0 + - pandas==1.3.4 + - scipy==1.9.0 + - rdkit-pypi==2022.9.5 + - posebusters diff --git a/docs/output.md b/docs/output.md index 05e11e79..1fd972ae 100644 --- a/docs/output.md +++ b/docs/output.md @@ -13,6 +13,8 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and predicts pr - [AlphaFold2](https://github.com/deepmind/alphafold) - [ColabFold](https://github.com/sokrypton/ColabFold) - MMseqs2 (API server or local search) followed by ColabFold - [ESMFold](https://github.com/facebookresearch/esm) +- [RoseTTAFold-All-Atom](https://github.com/baker-laboratory/RoseTTAFold-All-Atom/) +- [HelixFold3](https://github.com/PaddlePaddle/PaddleHelix/tree/dev/apps/protein_folding/helixfold3) See main [README.md](https://github.com/nf-core/proteinfold/blob/master/README.md) for a condensed overview of the steps in the pipeline, and the bioinformatics tools used at each step. @@ -176,6 +178,18 @@ Below you can find an indicative example of the TSV file with the pLDDT scores p | 49 | CB | VAL | 7 | 52.74 | | 50 | O | VAL | 7 | 56.46 | +### HelixFold3 + +
+Output files + +- `run/` + - `_helixfold3.pdb` that is the structure with the highest pLDDT score (ranked first) + - `_plddt_mqc.tsv` that presents the pLDDT scores per residue for the predicted model + - `/` that contains the computed MSAs, prediction metadata, ranked structures, raw model outputs etc. + +
+ ### MultiQC report
diff --git a/docs/usage.md b/docs/usage.md index 57624147..abfdc2a3 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -426,6 +426,18 @@ If you specify the `--esmfold_db ` parameter, the directory structure of y └── esmfold_3B_v1.pt ``` +HelixFold3 can be run using this command (note that HF3 requires .json files not .fasta): + +```console +nextflow run nf-core/proteinfold \ + --input samplesheet.csv \ + --outdir \ + --mode helixfold3 \ + --helixfold3_db \ + --use_gpu \ + -profile +``` + This will launch the pipeline with the `docker` configuration profile. See below for more information about profiles. Note that the pipeline will create the following files in your working directory: diff --git a/main.nf b/main.nf index d1ec1a6b..7608de18 100644 --- a/main.nf +++ b/main.nf @@ -27,6 +27,10 @@ if (params.mode.toLowerCase().split(",").contains("esmfold")) { include { PREPARE_ESMFOLD_DBS } from './subworkflows/local/prepare_esmfold_dbs' include { ESMFOLD } from './workflows/esmfold' } +if (params.mode.toLowerCase().split(",").contains("helixfold3")) { + include { PREPARE_HELIXFOLD3_DBS } from './subworkflows/local/prepare_helixfold3_dbs' + include { HELIXFOLD3 } from './workflows/helixfold3' +} include { PIPELINE_INITIALISATION } from './subworkflows/local/utils_nfcore_proteinfold_pipeline' include { PIPELINE_COMPLETION } from './subworkflows/local/utils_nfcore_proteinfold_pipeline' @@ -65,6 +69,7 @@ workflow NFCORE_PROTEINFOLD { ch_alphafold_top_ranked_pdb = Channel.empty() ch_colabfold_top_ranked_pdb = Channel.empty() ch_esmfold_top_ranked_pdb = Channel.empty() + ch_helixfold3_top_ranked_pdb = Channel.empty() ch_multiqc = Channel.empty() ch_versions = Channel.empty() ch_report_input = Channel.empty() @@ -205,6 +210,68 @@ workflow NFCORE_PROTEINFOLD { ch_report_input = ch_report_input.mix(ESMFOLD.out.pdb_msa) } + // + // WORKFLOW: Run helixfold3 + // + if(requested_modes.contains("helixfold3")) { + // + // SUBWORKFLOW: Prepare helixfold3 DBs + // + PREPARE_HELIXFOLD3_DBS ( + params.helixfold3_db, + params.helixfold3_uniclust30_link, + params.helixfold3_ccd_preprocessed_link, + params.helixfold3_rfam_link, + params.helixfold3_init_models_link, + params.helixfold3_bfd_link, + params.helixfold3_small_bfd_link, + params.helixfold3_uniprot_sprot_link, + params.helixfold3_uniprot_trembl_link, + params.helixfold3_pdb_seqres_link, + params.helixfold3_uniref90_link, + params.helixfold3_mgnify_link, + params.helixfold3_pdb_mmcif_link, + params.helixfold3_pdb_obsolete_link, + params.helixfold3_uniclust30_path, + params.helixfold3_ccd_preprocessed_path, + params.helixfold3_rfam_path, + params.helixfold3_init_models_path, + params.helixfold3_bfd_path, + params.helixfold3_small_bfd_path, + params.helixfold3_uniprot_path, + params.helixfold3_pdb_seqres_path, + params.helixfold3_uniref90_path, + params.helixfold3_mgnify_path, + params.helixfold3_pdb_mmcif_path, + params.helixfold3_maxit_src_path + ) + ch_versions = ch_versions.mix(PREPARE_HELIXFOLD3_DBS.out.versions) + + // + // WORKFLOW: Run nf-core/helixfold3 workflow + // + HELIXFOLD3 ( + ch_samplesheet, + ch_versions, + PREPARE_HELIXFOLD3_DBS.out.helixfold3_uniclust30, + PREPARE_HELIXFOLD3_DBS.out.helixfold3_ccd_preprocessed, + PREPARE_HELIXFOLD3_DBS.out.helixfold3_rfam, + PREPARE_HELIXFOLD3_DBS.out.helixfold3_bfd, + PREPARE_HELIXFOLD3_DBS.out.helixfold3_small_bfd, + PREPARE_HELIXFOLD3_DBS.out.helixfold3_uniprot, + PREPARE_HELIXFOLD3_DBS.out.helixfold3_pdb_seqres, + PREPARE_HELIXFOLD3_DBS.out.helixfold3_uniref90, + PREPARE_HELIXFOLD3_DBS.out.helixfold3_mgnify, + PREPARE_HELIXFOLD3_DBS.out.helixfold3_pdb_mmcif, + PREPARE_HELIXFOLD3_DBS.out.helixfold3_init_models, + PREPARE_HELIXFOLD3_DBS.out.helixfold3_maxit_src + ) + ch_helixfold3_top_ranked_pdb = HELIXFOLD3.out.top_ranked_pdb + ch_multiqc = ch_multiqc.mix(HELIXFOLD3.out.multiqc_report.collect()) + ch_versions = ch_versions.mix(HELIXFOLD3.out.versions) + ch_report_input = ch_report_input.mix(HELIXFOLD3.out.pdb_msa) + } + // // POST PROCESSING: generate visualisation reports // @@ -247,7 +314,8 @@ workflow NFCORE_PROTEINFOLD { ch_multiqc_methods_description, ch_alphafold_top_ranked_pdb, ch_colabfold_top_ranked_pdb, - ch_esmfold_top_ranked_pdb + ch_esmfold_top_ranked_pdb, + ch_helixfold3_top_ranked_pdb ) emit: diff --git a/modules/local/run_helixfold3.nf b/modules/local/run_helixfold3.nf new file mode 100644 index 00000000..2571cb5c --- /dev/null +++ b/modules/local/run_helixfold3.nf @@ -0,0 +1,117 @@ +/* + * Run HelixFold3 + */ +process RUN_HELIXFOLD3 { + tag "$meta.id" + label 'gpu_compute' + label 'process_medium' + + // Exit if running this module with -profile conda / -profile mamba + if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) { + error("Local RUN_HELIXFOLD3 module does not support Conda. Please use Docker / Singularity / Podman / Apptainer instead.") + } + + container "nf-core/proteinfold_helixfold3:dev" + + input: + tuple val(meta), path(fasta) + path ('uniclust30/*') + path ('*') + path ('*') + path ('bfd/*') + path ('small_bfd/*') + path ('uniprot/*') + path ('pdb_seqres/*') + path ('uniref90/*') + path ('mgnify/*') + path ('pdb_mmcif/*') + path ('init_models/*') + path ('maxit_src') + + output: + path ("${fasta.baseName}*") + tuple val(meta), path ("${meta.id}_helixfold3.pdb") , emit: top_ranked_pdb + tuple val(meta), path ("${fasta.baseName}/ranked*pdb"), emit: pdb + tuple val(meta), path ("*_mqc.tsv") , emit: multiqc + tuple val(meta), path ("${meta.id}_helixfold3.cif") , emit: main_cif + path "versions.yml", emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + """ + export MAXIT_SRC="./maxit_src" + export RCSBROOT="\$MAXIT_SRC" + export PATH="\$MAXIT_SRC/bin:\$ENV_BIN:$PATH" + export OBABEL_BIN="\$ENV_BIN" + + ln -s /app/helixfold3/* . + + \$ENV_BIN/python3.9 inference.py \ + --maxit_binary "\$MAXIT_SRC/bin/maxit" \ + --jackhmmer_binary_path "\$ENV_BIN/jackhmmer" \ + --hhblits_binary_path "\$ENV_BIN/hhblits" \ + --hhsearch_binary_path "\$ENV_BIN/hhsearch" \ + --kalign_binary_path "\$ENV_BIN/kalign" \ + --hmmsearch_binary_path "\$ENV_BIN/hmmsearch" \ + --hmmbuild_binary_path "\$ENV_BIN/hmmbuild" \ + --preset='reduced_dbs' \ + --bfd_database_path="./bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt" \ + --small_bfd_database_path="./small_bfd/bfd-first_non_consensus_sequences.fasta" \ + --uniclust30_database_path="./uniclust30/uniclust30_2018_08" \ + --uniprot_database_path="./uniprot/uniprot.fasta" \ + --pdb_seqres_database_path="./pdb_seqres/pdb_seqres.txt" \ + --rfam_database_path="./Rfam-14.9_rep_seq.fasta" \ + --template_mmcif_dir="./pdb_mmcif/mmcif_files" \ + --obsolete_pdbs_path="./pdb_mmcif/obsolete.dat" \ + --ccd_preprocessed_path="./ccd_preprocessed_etkdg.pkl.gz" \ + --uniref90_database_path "./uniref90/uniref90.fasta" \ + --mgnify_database_path "./mgnify/mgy_clusters_2018_12.fa" \ + --max_template_date=2024-08-14 \ + --input_json="${fasta}" \ + --output_dir="\$PWD" \ + --model_name allatom_demo \ + --init_model "./init_models/HelixFold3-240814.pdparams" \ + --infer_times 4 \ + --logging_level "ERROR" \ + --precision "bf16" + + cp "${fasta.baseName}"/"${fasta.baseName}"-rank1/predicted_structure.pdb ./"${meta.id}"_helixfold3.pdb + cp "${fasta.baseName}"/"${fasta.baseName}"-rank1/predicted_structure.cif ./"${meta.id}"_helixfold3.cif + cd "${fasta.baseName}" + awk '{print \$6"\\t"\$11}' "${fasta.baseName}"-rank1/predicted_structure.pdb > ranked_1_plddt.tsv + for i in 2 3 4 + do awk '{print \$6"\\t"\$11}' "${fasta.baseName}"-rank\$i/predicted_structure.pdb | awk '{print \$2}' > ranked_"\$i"_plddt.tsv + done + paste ranked_1_plddt.tsv ranked_2_plddt.tsv ranked_3_plddt.tsv ranked_4_plddt.tsv > plddt.tsv + echo -e Positions"\\t"rank_1"\\t"rank_2"\\t"rank_3"\\t"rank_4 > header.tsv + cat header.tsv plddt.tsv > ../"${meta.id}"_plddt_mqc.tsv + for i in 1 2 3 4 + do cp ""${fasta.baseName}"-rank\$i/predicted_structure.pdb" ./ranked_\$i.pdb + done + cd .. + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + python: \$(python3 --version | sed 's/Python //g') + END_VERSIONS + """ + + stub: + """ + touch ./"${meta.id}"_helixfold3.cif + touch ./"${meta.id}"_helixfold3.pdb + touch ./"${meta.id}"_plddt_mqc.tsv + mkdir "${fasta.baseName}" + touch "${fasta.baseName}/ranked_1.pdb" + touch "${fasta.baseName}/ranked_2.pdb" + touch "${fasta.baseName}/ranked_3.pdb" + touch "${fasta.baseName}/ranked_4.pdb" + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + python: \$(python3 --version | sed 's/Python //g') + END_VERSIONS + """ +} diff --git a/nextflow.config b/nextflow.config index d3322f34..1092c37e 100644 --- a/nextflow.config +++ b/nextflow.config @@ -11,7 +11,7 @@ params { // Input options input = null - mode = 'alphafold2' // {alphafold2, colabfold, esmfold} + mode = 'alphafold2' // {alphafold2, colabfold, esmfold, helixfold3} use_gpu = false split_fasta = false @@ -80,6 +80,38 @@ params { // Esmfold paths esmfold_params_path = null + // Helixfold3 parameters + helixfold3_db = null + + // Helixfold3 links + helixfold3_uniclust30_link = null + helixfold3_ccd_preprocessed_link = null + helixfold3_rfam_link = null + helixfold3_init_models_link = null + helixfold3_bfd_link = null + helixfold3_small_bfd_link = null + helixfold3_uniprot_sprot_link = null + helixfold3_uniprot_trembl_link = null + helixfold3_pdb_seqres_link = null + helixfold3_uniref90_link = null + helixfold3_mgnify_link = null + helixfold3_pdb_mmcif_link = null + helixfold3_pdb_obsolete_link = null + + // Helixfold3 paths + helixfold3_uniclust30_path = null + helixfold3_ccd_preprocessed_path = null + helixfold3_rfam_path = null + helixfold3_init_models_path = null + helixfold3_bfd_path = null + helixfold3_small_bfd_path = null + helixfold3_uniprot_path = null + helixfold3_pdb_seqres_path = null + helixfold3_uniref90_path = null + helixfold3_mgnify_path = null + helixfold3_pdb_mmcif_path = null + helixfold3_maxit_src_path = null + // Foldseek params foldseek_search = null foldseek_easysearch_arg = null @@ -214,6 +246,7 @@ profiles { apptainer { apptainer.enabled = true apptainer.autoMounts = true + if (params.use_gpu) { apptainer.runOptions = '--nv' } conda.enabled = false docker.enabled = false singularity.enabled = false @@ -257,6 +290,7 @@ profiles { test_full_colabfold_multimer { includeConfig 'conf/test_full_colabfold_webserver_multimer.config' } test_full_esmfold { includeConfig 'conf/test_full_esmfold.config' } test_full_esmfold_multimer { includeConfig 'conf/test_full_esmfold_multimer.config' } + test_helixfold3 { includeConfig 'conf/test_helixfold3.config' } } // Load nf-core custom profiles from different Institutions @@ -404,6 +438,9 @@ if (params.mode.toLowerCase().split(",").contains("colabfold")) { if (params.mode.toLowerCase().split(",").contains("esmfold")) { includeConfig 'conf/modules_esmfold.config' } +if (params.mode.toLowerCase().split(",").contains("helixfold3")) { + includeConfig 'conf/modules_helixfold3.config' +} // Load links to DBs and parameters includeConfig 'conf/dbs.config' diff --git a/nextflow_schema.json b/nextflow_schema.json index 073e5d73..114ef729 100644 --- a/nextflow_schema.json +++ b/nextflow_schema.json @@ -80,7 +80,6 @@ }, "full_dbs": { "type": "boolean", - "default": false, "description": "If true uses the full version of the BFD database otherwise, otherwise it uses its reduced version, small bfd", "fa_icon": "fas fa-battery-full" }, @@ -194,7 +193,8 @@ "type": "string", "description": "Specifies whether is a 'monomer' or 'multimer' prediction", "enum": ["monomer", "multimer"], - "fa_icon": "fas fa-stream" + "fa_icon": "fas fa-stream", + "default": "monomer" } } }, @@ -385,52 +385,62 @@ "bfd_path": { "type": "string", "description": "Path to BFD dababase", - "fa_icon": "fas fa-folder-open" + "fa_icon": "fas fa-folder-open", + "default": "null/bfd/*" }, "small_bfd_path": { "type": "string", "description": "Path to a reduced version of the BFD database", - "fa_icon": "fas fa-folder-open" + "fa_icon": "fas fa-folder-open", + "default": "null/small_bfd/*" }, "alphafold2_params_path": { "type": "string", "description": "Path to the Alphafold2 parameters", - "fa_icon": "fas fa-folder-open" + "fa_icon": "fas fa-folder-open", + "default": "null/alphafold_params_*/*" }, "mgnify_path": { "type": "string", "description": "Path to the MGnify database", - "fa_icon": "fas fa-folder-open" + "fa_icon": "fas fa-folder-open", + "default": "null/mgnify/*" }, "pdb70_path": { "type": "string", "description": "Path to the PDB70 database", - "fa_icon": "fas fa-folder-open" + "fa_icon": "fas fa-folder-open", + "default": "null/pdb70/**" }, "pdb_mmcif_path": { "type": "string", "description": "Path to the PDB mmCIF database", - "fa_icon": "fas fa-folder-open" + "fa_icon": "fas fa-folder-open", + "default": "null/pdb_mmcif/*" }, "uniref30_alphafold2_path": { "type": "string", "description": "Path to the Uniref30 database", - "fa_icon": "fas fa-folder-open" + "fa_icon": "fas fa-folder-open", + "default": "null/uniref30/*" }, "uniref90_path": { "type": "string", "description": "Path to the UniRef90 database", - "fa_icon": "fas fa-folder-open" + "fa_icon": "fas fa-folder-open", + "default": "null/uniref90/*" }, "pdb_seqres_path": { "type": "string", "description": "Path to the PDB SEQRES database", - "fa_icon": "fas fa-folder-open" + "fa_icon": "fas fa-folder-open", + "default": "null/pdb_seqres/*" }, "uniprot_path": { "type": "string", "description": "Path to UniProt database containing the SwissProt and the TrEMBL databases", - "fa_icon": "fas fa-folder-open" + "fa_icon": "fas fa-folder-open", + "default": "null/uniprot/*" } } }, @@ -468,12 +478,14 @@ "colabfold_db_path": { "type": "string", "description": "Link to the Colabfold database", - "fa_icon": "fas fa-folder-open" + "fa_icon": "fas fa-folder-open", + "default": "null/colabfold_envdb_202108" }, "uniref30_colabfold_path": { "type": "string", "description": "Link to the UniRef30 database", - "fa_icon": "fas fa-folder-open" + "fa_icon": "fas fa-folder-open", + "default": "null/uniref30_2302" }, "colabfold_alphafold2_params_path": { "type": "string", @@ -522,7 +534,8 @@ "esmfold_params_path": { "type": "string", "description": "Link to the Esmfold parameters", - "fa_icon": "fas fa-folder-open" + "fa_icon": "fas fa-folder-open", + "default": "null/*" } } }, @@ -675,5 +688,110 @@ { "$ref": "#/$defs/generic_options" } - ] + ], + "properties": { + "helixfold3_init_models_link": { + "type": "string", + "default": "https://paddlehelix.bd.bcebos.com/HelixFold3/params/HelixFold3-params-240814.zip" + }, + "helixfold3_init_models_path": { + "type": "string", + "default": "null/HelixFold3-240814.pdparams" + }, + "helixfold3_db": { + "type": "string" + }, + "helixfold3_uniclust30_link": { + "type": "string", + "default": "https://storage.googleapis.com/alphafold-databases/casp14_versions/uniclust30_2018_08_hhsuite.tar.gz" + }, + "helixfold3_ccd_preprocessed_link": { + "type": "string", + "default": "https://paddlehelix.bd.bcebos.com/HelixFold3/CCD/ccd_preprocessed_etkdg.pkl.gz" + }, + "helixfold3_rfam_link": { + "type": "string", + "default": "https://paddlehelix.bd.bcebos.com/HelixFold3/MSA/Rfam-14.9_rep_seq.fasta" + }, + "helixfold3_bfd_link": { + "type": "string", + "default": "https://storage.googleapis.com/alphafold-databases/casp14_versions/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt.tar.gz" + }, + "helixfold3_small_bfd_link": { + "type": "string", + "default": "https://storage.googleapis.com/alphafold-databases/reduced_dbs/bfd-first_non_consensus_sequences.fasta.gz" + }, + "helixfold3_pdb_seqres_link": { + "type": "string", + "default": "https://files.wwpdb.org/pub/pdb/derived_data/pdb_seqres.txt" + }, + "helixfold3_uniref90_link": { + "type": "string", + "default": "ftp://ftp.uniprot.org/pub/databases/uniprot/uniref/uniref90/uniref90.fasta.gz" + }, + "helixfold3_mgnify_link": { + "type": "string", + "default": "https://storage.googleapis.com/alphafold-databases/casp14_versions/mgy_clusters_2018_12.fa.gz" + }, + "helixfold3_pdb_mmcif_link": { + "type": "string", + "default": "rsync.rcsb.org::ftp_data/structures/divided/mmCIF/" + }, + "helixfold3_uniclust30_path": { + "type": "string", + "default": "null/uniclust30/*" + }, + "helixfold3_ccd_preprocessed_path": { + "type": "string", + "default": "null/ccd_preprocessed_etkdg.pkl.gz" + }, + "helixfold3_rfam_path": { + "type": "string", + "default": "null/Rfam-14.9_rep_seq.fasta" + }, + "helixfold3_bfd_path": { + "type": "string", + "default": "null/bfd/*" + }, + "helixfold3_small_bfd_path": { + "type": "string", + "default": "null/small_bfd/*" + }, + "helixfold3_uniprot_path": { + "type": "string", + "default": "null/uniprot/*" + }, + "helixfold3_pdb_seqres_path": { + "type": "string", + "default": "null/pdb_seqres/*" + }, + "helixfold3_uniref90_path": { + "type": "string", + "default": "null/uniref90/*" + }, + "helixfold3_mgnify_path": { + "type": "string", + "default": "null/mgnify/*" + }, + "helixfold3_pdb_mmcif_path": { + "type": "string", + "default": "null/pdb_mmcif/*" + }, + "helixfold3_uniprot_sprot_link": { + "type": "string", + "default": "ftp://ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz" + }, + "helixfold3_uniprot_trembl_link": { + "type": "string", + "default": "ftp://ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_trembl.fasta.gz" + }, + "helixfold3_pdb_obsolete_link": { + "type": "string", + "default": "ftp://ftp.wwpdb.org/pub/pdb/data/status/obsolete.dat" + }, + "helixfold3_maxit_src_path": { + "type": "string", + "default": "null/maxit-v11.200-prod-src" + } + } } diff --git a/subworkflows/local/post_processing.nf b/subworkflows/local/post_processing.nf index f4e928fb..970f1e7b 100644 --- a/subworkflows/local/post_processing.nf +++ b/subworkflows/local/post_processing.nf @@ -38,6 +38,7 @@ workflow POST_PROCESSING { ch_alphafold2_top_ranked_pdb ch_colabfold_top_ranked_pdb ch_esmfold_top_ranked_pdb + ch_helixfold3_top_ranked_pdb main: ch_comparison_report_files = Channel.empty() @@ -67,6 +68,10 @@ workflow POST_PROCESSING { ch_esmfold_top_ranked_pdb ) + ch_comparison_report_files = ch_comparison_report_files.mix( + ch_helixfold3_top_ranked_pdb + ) + ch_comparison_report_files .groupTuple(by: [0], size: requested_modes_size) .set { ch_comparison_report_input } diff --git a/subworkflows/local/prepare_helixfold3_dbs.nf b/subworkflows/local/prepare_helixfold3_dbs.nf new file mode 100644 index 00000000..5f66c8af --- /dev/null +++ b/subworkflows/local/prepare_helixfold3_dbs.nf @@ -0,0 +1,149 @@ +// +// Download all the required AlphaFold 2 databases and parameters +// + +include { + ARIA2_UNCOMPRESS as ARIA2_UNICLUST30 + ARIA2_UNCOMPRESS as ARIA2_CCD_PREPROCESSED + ARIA2_UNCOMPRESS as ARIA2_RFAM + ARIA2_UNCOMPRESS as ARIA2_BFD + ARIA2_UNCOMPRESS as ARIA2_SMALL_BFD + ARIA2_UNCOMPRESS as ARIA2_UNIPROT_SPROT + ARIA2_UNCOMPRESS as ARIA2_UNIPROT_TREMBL + ARIA2_UNCOMPRESS as ARIA2_UNIREF90 + ARIA2_UNCOMPRESS as ARIA2_MGNIFY + ARIA2_UNCOMPRESS as ARIA2_INIT_MODELS +} from './aria2_uncompress' + +include { ARIA2 as ARIA2_PDB_SEQRES } from '../../modules/nf-core/aria2/main' + +include { COMBINE_UNIPROT } from '../../modules/local/combine_uniprot' +include { DOWNLOAD_PDBMMCIF } from '../../modules/local/download_pdbmmcif' + +workflow PREPARE_HELIXFOLD3_DBS { + + take: + helixfold3_db + helixfold3_uniclust30_link + helixfold3_ccd_preprocessed_link + helixfold3_rfam_link + helixfold3_init_models_link + helixfold3_bfd_link + helixfold3_small_bfd_link + helixfold3_uniprot_sprot_link + helixfold3_uniprot_trembl_link + helixfold3_pdb_seqres_link + helixfold3_uniref90_link + helixfold3_mgnify_link + helixfold3_pdb_mmcif_link + helixfold3_pdb_obsolete_link + helixfold3_uniclust30_path + helixfold3_ccd_preprocessed_path + helixfold3_rfam_path + helixfold3_init_models_path + helixfold3_bfd_path + helixfold3_small_bfd_path + helixfold3_uniprot_path + helixfold3_pdb_seqres_path + helixfold3_uniref90_path + helixfold3_mgnify_path + helixfold3_pdb_mmcif_path + helixfold3_maxit_src_path + + main: + ch_helixfold3_maxit_src = Channel.value(file(helixfold3_maxit_src_path)) + ch_versions = Channel.empty() + + if (helixfold3_db) { + ch_helixfold3_uniclust30 = Channel.value(file(helixfold3_uniclust30_path)) + ch_helixfold3_ccd_preprocessed = Channel.value(file(helixfold3_ccd_preprocessed_path)) + ch_helixfold3_rfam = Channel.value(file(helixfold3_rfam_path)) + ch_helixfold3_bfd = Channel.value(file(helixfold3_bfd_path)) + ch_helixfold3_small_bfd = Channel.value(file(helixfold3_small_bfd_path)) + ch_helixfold3_uniprot = Channel.value(file(helixfold3_uniprot_path)) + ch_helixfold3_pdb_seqres = Channel.value(file(helixfold3_pdb_seqres_path)) + ch_helixfold3_uniref90 = Channel.value(file(helixfold3_uniref90_path)) + ch_helixfold3_mgnify = Channel.value(file(helixfold3_mgnify_path)) + ch_mmcif_files = file(helixfold3_pdb_mmcif_path, type: 'dir') + ch_mmcif_obsolete = file(helixfold3_pdb_mmcif_path, type: 'file') + ch_helixfold3_pdb_mmcif = Channel.value(ch_mmcif_files + ch_mmcif_obsolete) + ch_helixfold3_init_models = Channel.value(file(helixfold3_init_models_path)) + } + else { + ARIA2_UNICLUST30(helixfold3_uniclust30_link) + ch_helixfold3_uniclust30 = ARIA2_UNICLUST30.out.db + ch_versions = ch_versions.mix(ARIA2_UNICLUST30.out.versions) + + ARIA2_CCD_PREPROCESSED(helixfold3_ccd_preprocessed_link) + ch_helixfold3_ccd_preprocessed = ARIA2_CCD_PREPROCESSED.out.db + ch_versions = ch_versions.mix(ARIA2_CCD_PREPROCESSED.out.versions) + + ARIA2_RFAM(helixfold3_rfam_link) + ch_helixfold3_rfam = ARIA2_RFAM.out.db + ch_versions = ch_versions.mix(ARIA2_RFAM.out.versions) + + ARIA2_BFD(helixfold3_bfd_link) + ch_helixfold3_bfd = ARIA2_BFD.out.db + ch_versions = ch_versions.mix(ARIA2_BFD.out.versions) + + ARIA2_SMALL_BFD(helixfold3_small_bfd_link) + ch_helixfold3_small_bfd = ARIA2_SMALL_BFD.out.db + ch_versions = ch_versions.mix(ARIA2_SMALL_BFD.out.versions) + + ARIA2_UNIREF90(helixfold3_uniref90_link) + ch_helixfold3_uniref90 = ARIA2_UNIREF90.out.db + ch_versions = ch_versions.mix(ARIA2_UNIREF90.out.versions) + + ARIA2_MGNIFY(helixfold3_mgnify_link) + ch_helixfold3_mgnify = ARIA2_MGNIFY.out.db + ch_versions = ch_versions.mix(ARIA2_MGNIFY.out.versions) + + DOWNLOAD_PDBMMCIF(helixfold3_pdb_mmcif_link, helixfold3_pdb_obsolete_link) + ch_helixfold3_pdb_mmcif = DOWNLOAD_PDBMMCIF.out.ch_db + ch_versions = ch_versions.mix(DOWNLOAD_PDBMMCIF.out.versions) + + ARIA2_INIT_MODELS(helixfold3_init_models_link) + ch_helixfold3_init_models = ARIA2_INIT_MODELS.out.db + ch_versions = ch_versions.mix(ARIA2_INIT_MODELS.out.versions) + + ARIA2_PDB_SEQRES ( + [ + [:], + helixfold3_pdb_seqres_link + ] + ) + ch_helixfold3_pdb_seqres = ARIA2_PDB_SEQRES.out.downloaded_file.map{ it[1] } + ch_versions = ch_versions.mix(ARIA2_PDB_SEQRES.out.versions) + + + ARIA2_UNIPROT_SPROT( + helixfold3_uniprot_sprot_link + ) + ch_versions = ch_versions.mix(ARIA2_UNIPROT_SPROT.out.versions) + ARIA2_UNIPROT_TREMBL( + helixfold3_uniprot_trembl_link + ) + ch_versions = ch_versions.mix(ARIA2_UNIPROT_TREMBL.out.versions) + COMBINE_UNIPROT ( + ARIA2_UNIPROT_SPROT.out.db, + ARIA2_UNIPROT_TREMBL.out.db + ) + ch_helixfold3_uniprot = COMBINE_UNIPROT.out.ch_db + ch_version = ch_versions.mix(COMBINE_UNIPROT.out.versions) + } + + emit: + helixfold3_uniclust30 = ch_helixfold3_uniclust30 + helixfold3_ccd_preprocessed = ch_helixfold3_ccd_preprocessed + helixfold3_rfam = ch_helixfold3_rfam + helixfold3_bfd = ch_helixfold3_bfd + helixfold3_small_bfd = ch_helixfold3_small_bfd + helixfold3_uniprot = ch_helixfold3_uniprot + helixfold3_pdb_seqres = ch_helixfold3_pdb_seqres + helixfold3_uniref90 = ch_helixfold3_uniref90 + helixfold3_mgnify = ch_helixfold3_mgnify + helixfold3_pdb_mmcif = ch_helixfold3_pdb_mmcif + helixfold3_init_models = ch_helixfold3_init_models + helixfold3_maxit_src = ch_helixfold3_maxit_src + versions = ch_versions +} diff --git a/workflows/helixfold3.nf b/workflows/helixfold3.nf new file mode 100644 index 00000000..26910fb9 --- /dev/null +++ b/workflows/helixfold3.nf @@ -0,0 +1,107 @@ +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + IMPORT LOCAL MODULES/SUBWORKFLOWS +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +*/ + +// +// MODULE: Loaded from modules/local/ +// +include { RUN_HELIXFOLD3 } from '../modules/local/run_helixfold3' + +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + IMPORT NF-CORE MODULES/SUBWORKFLOWS +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +*/ + +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + RUN MAIN WORKFLOW +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +*/ + +workflow HELIXFOLD3 { + + take: + ch_samplesheet + ch_versions // channel: [ path(versions.yml) ] + ch_helixfold3_uniclust30 + ch_helixfold3_ccd_preprocessed + ch_helixfold3_rfam + ch_helixfold3_bfd + ch_helixfold3_small_bfd + ch_helixfold3_uniprot + ch_helixfold3_pdb_seqres + ch_helixfold3_uniref90 + ch_helixfold3_mgnify + ch_helixfold3_pdb_mmcif + ch_helixfold3_init_models + ch_helixfold3_maxit_src + + main: + ch_multiqc_files = Channel.empty() + ch_pdb = Channel.empty() + ch_top_ranked_pdb = Channel.empty() + ch_msa = Channel.empty() + ch_multiqc_report = Channel.empty() + + // + // SUBWORKFLOW: Run helixfold3 + // + RUN_HELIXFOLD3 ( + ch_samplesheet, + ch_helixfold3_uniclust30, + ch_helixfold3_ccd_preprocessed, + ch_helixfold3_rfam, + ch_helixfold3_bfd, + ch_helixfold3_small_bfd, + ch_helixfold3_uniprot, + ch_helixfold3_pdb_seqres, + ch_helixfold3_uniref90, + ch_helixfold3_mgnify, + ch_helixfold3_pdb_mmcif, + ch_helixfold3_init_models, + ch_helixfold3_maxit_src + ) + + RUN_HELIXFOLD3 + .out + .multiqc + .map { it[1] } + .toSortedList() + .map { [ [ "model": "helixfold3" ], it.flatten() ] } + .set { ch_multiqc_report } + + ch_pdb = ch_pdb.mix(RUN_HELIXFOLD3.out.pdb) + ch_top_ranked_pdb = ch_top_ranked_pdb.mix(RUN_HELIXFOLD3.out.top_ranked_pdb) + ch_versions = ch_versions.mix(RUN_HELIXFOLD3.out.versions) + + ch_top_ranked_pdb + .map { [ it[0]["id"], it[0], it[1] ] } + .set { ch_top_ranked_pdb } + + ch_pdb + .join(ch_msa) + .map { + it[0]["model"] = "helixfold3" + it + } + .set { ch_pdb_msa } + + ch_pdb_msa + .map { [ it[0]["id"], it[0], it[1], it[2] ] } + .set { ch_top_ranked_pdb } + + emit: + top_ranked_pdb = ch_top_ranked_pdb // channel: [ id, /path/to/*.pdb ] + pdb_msa = ch_pdb_msa // channel: [ meta, /path/to/*.pdb, /path/to/*_coverage.png ] + multiqc_report = ch_multiqc_report // channel: /path/to/multiqc_report.html + versions = ch_versions // channel: [ path(versions.yml) ] +} + +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + THE END +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +*/