Skip to content

Latest commit

 

History

History
86 lines (63 loc) · 3.19 KB

installation.md

File metadata and controls

86 lines (63 loc) · 3.19 KB

Installation

Nextflow pipeline

The ProteinFold pipeline is implemented according to the geniac guidelines. Details are available at:

  • The geniac documentation provides a set of best practises to implement Nextflow pipelines.
  • The geniac source code provides the set of utilities.
  • The geniac demo provides a toy pipeline to test and practise Geniac.
  • The geniac template provides a pipeline template to start a new pipeline.

The pipeline can be installed with the geniac toolbox as described in the next sections.

Install conda

If conda is not available on your computer, proceed as follows:

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh

Create the geniac conda environment

export GENIAC_CONDA="https://raw.githubusercontent.com/bioinfo-pf-curie/geniac/release/environment.yml"
wget ${GENIAC_CONDA}
conda env create -f environment.yml
conda activate geniac
pip install geniac

Install the pipeline with Geniac

export WORK_DIR="${HOME}/tmp/myPipeline"
export INSTALL_DIR="${WORK_DIR}/install"
export GIT_URL="https://github.com/bioinfo-pf-curie/proteinfold.git"

# Initialization of a working directory
# with the src and build folders
geniac init -w ${WORK_DIR} ${GIT_URL}
cd ${WORK_DIR}

# As the building of apptainer/singularity requires more than 16GB of RAM memory,
# it is highly recommended to define a dedicated folder on your filesystem.
# This folder will be used instead of the RAM
export APPTAINER_TMPDIR=$HOME/tmp/apptainer_tmpdir
mkdir -p $APPTAINER_TMPDIR

# Install the pipeline with the singularity images
geniac install . ${INSTALL_DIR} -m singularity
sudo chown -R  $(id -gn):$(id -gn) build

# Test the pipeline with the singularity profile
geniac test singularity

# Test the pipeline with the singularity and cluster profiles
geniac test singularity --check-cluster

Annotations

Protein annotations databases and other dependency files are required to run the pipeline.

Note that the paths to the annotations required by the different tools implemented in the pipeline are defined in the file conf/genomes.config using the scope params which is defined by nextflow. For each tool, the following patttern is used to defined the path to the required data: params.genomes.toolName.database. For example, ton run AlphaFold, you have to defined in which folder you have dowloaded all the protein data:

params {
  genomes {
    alphafold {
      database = "${params.genomeAnnotationPath}/proteinfold/alphafold/2023-12"
    }
}

We provide below the link to the detailed procedure to install the data required by each tool.