Skip to content

Installation Instructions

Mateo Torres edited this page Feb 3, 2023 · 4 revisions

A one-command solution is available to setup the entire environment for S2F (this will download HUNDREDS of GB of datasets, please look below to configure the installer according to your needs, or the command line options). You may simply run:

python install --config-file s2f.conf

You will be asked to confirm the configured options of the installation script. If you're happy and reply with y, the installer will download and process the required databases. This, however, assumes that all requirements listed above are met.

Configure the installation script

The easiest way to configure the installation is through the configuration file. The default values are:

installation_directory = ~/.S2F

interpro = iprscan
hmmer = phmmer
blastp = blastp
makeblastdb = makeblastdb

string_links = download
string_sequences = download
string_species = download
uniprot_sprot = download
uniprot_goa = download
filtered_goa = infer
filtered_sprot = infer

evidence_codes = experimental

A description of each option is provided in the following table:

Option Description
installation_directory Path to the installation directory for S2F.
interpro manually provide the path to the iprscan executable to avoid passing this parameter to the other commands every time.
hmmer manually provide the path to the phmmer executable to avoid passing this parameter to the other commands every time.
blastp manually provide the path to the blastp command in the system. If not provided, S2F will assume that the executable is available system-wide.
makeblastdb manually provide the path to the makeblastdb command in the system. If not provided, S2F will assume that the executable is available system-wide.
string_links 'manually provide the path to the STRING interactions database, it must be the full path to either protein.links.full.vX.x.txt.gz or protein.links.detailed.vX.x.txt.gz. If not provided, the installation script will attempt to download the full database using the wget command.'
string_sequences manually provide the path to the STRING sequences database, it must be the full path to the protein.sequences.vX.x.fa.gz file. If not provided, the installation script will attempt to download it using the wget command.'
string_species manually provide the path to the STRIN species list, it must be the full path to the species.vX.x.txt file. If not provided, the installation script will attempt to download it using the wget command.
uniprot_sprot manually provide the path to the UniProt SwissProt sequences, it must be the full path to the "goa_uniprot_all.gaf.gz" file. If not provided, the installation script will attempt to download it using the wget command.
uniprot_goa manually provide the path to the UniProt GOA, it must be the full path to the "goa_uniprot_all.gaf.gz" file. If not provided, the installation script will attempt to download it using the wget command.
evidence_codes manually provide a list of evidence codes that will be used to filter the UniProt GOA. If not provided, S2F will be installed using only experimental evidence codes. Example: EXP,IDA,IPI,IMP.

Note: the default setting is equivalent to EXP,IDA,IPI,IMP,IGI,IEP,TAS,IC. Even if TAS and IC are not listed as experimental on the Gene Ontology website.

Installer options

The command line options for the installation are the following (but we highly recommend having a look at the configuration file to avoid mistakes):

Option Description Default Value
--installation-directory Path to the installation directory for S2F. ~/.S2F
--config-file location of the configuration file that will be created. If not provided, the default configuration file will be loaded. s2f.conf (found in the script's directory)
--interpro manually provide the path to the iprscan executable to avoid passing this parameter to the other commands every time. iprscan (assumes this is correctly configured in the PATH environment variable)
--hmmer manually provide the path to the phmmer executable to avoid passing this parameter to the other commands every time. phmmer (assumes this is correctly configured in the PATH environment variable)
--blastp manually provide the path to the blastp command in the system. If not provided, S2F will assume that the executable is available system-wide. blastp (assumes this is correctly configured in the PATH environment variable)
--makeblastdb manually provide the path to the makeblastdb command in the system. If not provided, S2F will assume that the executable is available system-wide. makeblastdb (assumes this is correctly configured in the PATH environment variable)
--string-links 'manually provide the path to the STRING interactions database, it must be the full path to either protein.links.full.vX.x.txt.gz or protein.links.detailed.vX.x.txt.gz. If not provided, the installation script will attempt to download the full database using the wget command.' download
--string-sequences manually provide the path to the STRING sequences database, it must be the full path to the protein.sequences.vX.x.fa.gz file. If not provided, the installation script will attempt to download it using the wget command.' download
--string-species manually provide the path to the STRIN species list, it must be the full path to the species.vX.x.txt file. If not provided, the installation script will attempt to download it using the wget command. download
--uniprot-swissprot manually provide the path to the UniProt SwissProt sequences, it must be the full path to the "goa_uniprot_all.gaf.gz" file. If not provided, the installation script will attempt to download it using the wget command. download
--uniprot-goa manually provide the path to the UniProt GOA, it must be the full path to the "goa_uniprot_all.gaf.gz" file. If not provided, the installation script will attempt to download it using the wget command. download
--evidence-codes manually provide a comma separated list of evidence codes that will be used to filter the UniProt GOA. If not provided, S2F will be installed using only experimental evidence codes. Example: EXP,IDA,IPI,IMP.

Note: the default experimental setting is equivalent to EXP,IDA,IPI,IMP,IGI,IEP,TAS,IC. Even if TAS and IC are not listed as experimental on the Gene Ontology website.