-
Notifications
You must be signed in to change notification settings - Fork 3
Quickstart
This step-by-step guide is intended to get you started with S2F. We will go through installing S2F, and making your first prediction.
You need to make sure that your computer meets the requirements, and you either download and clone this repository using git
. Important: If you want to use the installer with the provided configuration file, please make sure that the binaries of the requirements are included in your PATH
, these binaries are:
iprscan
phmmer
blastp
makeblastdb
Let's start by cloning this repository and moving into it to install S2F.
git clone https://github.com/paccanarolab/S2F
cd S2F
S2F depends on some standard Python libraries, you can install all of them by running
pip install -r requirements.txt
S2F comes with an interactive command line installer, simply run
python S2F.py install --config-file s2f.conf
This will download all of the required databases and will save a configuration file with all the options you've chosen. You can modify this configuration file later if you decide to, say, move the location of the databases to a different drive. Important: this will download a copy of the STRING, GOA, and SwissProt databases, which have a considerable size, the installation might take several hours, or even days, depending on the speed of your internet connection.
You will need to download the Gene Ontology go.obo file.
For this guide, let's assume that you have downloaded a FASTA file (If you don't know where to get one, you can download the Suppplementary Data from our website). Let the name of the FASTA file be target.fasta
The simplest way to make a prediciton is to run the following command
python S2F.py predict --config-file s2f.conf --alias myTarget --fasta target.fasta --obo go.obo
The process will then begin, and you will be able to find the results on S2F's installation directory. Several messages will appear in the terminal while S2F is running to update you on the current step of the pipeline. For a complete run (which includes the pairwise alignment of the provided proteome against the entire STRING database), the total runtime will take between 5 and 10 hours on a computer with 12 cores. S2F will save intermediate results and will maintain a cache of the aligned sequences for subsequent runs, reducing the runtime significantly.
Let the configured output directory be ~/S2F-installation/output
. For the prediction command used above, the prediction file will be located at ~/S2F-installation/output/myTarget/prediction.df
. This is a tab separated text file, with the following columns:
- Protein ID (matches the IDs in the FASTA file used as input)
- GO term ID (from the provided
go.obo
file) - Score
In this section, we go over the set of commands that you would use on an example fasta file in order to get predictions using S2F. We assume that you have followed the installation instructions.
For this case, we will be using the 83332.fasta
file, which contains the protein sequences for Mycobacterium tuberculosis. (You can download a copy of this file in the Suppplementary Data from our website). Assuming you download this file to the directory where S2F.py
is located, you simply need to run:
python S2F.py predict --config-file s2f.conf --alias 83332 --fasta 83332.fasta --obo go.obo
By default, the output file will be located at ~/S2F-installation/output/83332/prediction.df
. As explained above the file is a tab separated file. The first 10 lines look like this:
sp|A0A089QKZ7|Y155A_MYCTU GO:0000001 9.224062214741884e-07
sp|A0A089QRB9|MSL3_MYCTU GO:0000001 1.3479576439607364e-07
sp|E2FZM4|SOCA_MYCTU GO:0000001 1.1725513954101507e-06
sp|E2FZM5|SOCB_MYCTU GO:0000001 1.0743922748653777e-06
sp|I6WXS6|VPB51_MYCTU GO:0000001 3.563783549701623e-07
sp|I6WZK7|MMCO_MYCTU GO:0000001 1.0454690835351242e-06
sp|I6X486|PE25_MYCTU GO:0000001 1.1318501983829803e-07
sp|I6X7F9|CDDTR_MYCTU GO:0000001 3.906041154000092e-07
sp|I6X8R5|RV203_MYCTU GO:0000001 7.748154286220518e-07
sp|I6XD65|PNCA_MYCTU GO:0000001 1.0809237542558064e-07
... (there are 78939830 more lines in this file)
Something’s not working for you? Do you think you found an error? Do you want to contribute to the development of S2F? contact us!