Please use repository tfold-release for the TFold package! (Present repository includes notebooks that were used in the data analysis for the paper, and which are not needed to run TFold.)

#AlphaFold-based pipeline for prediction of peptide-MHC structures. Please cite as:
Victor Mikhaylov and Arnold J. Levine, "Accurate modeling of peptide-MHC structures with AlphaFold", to appear.

#Download and install

Download AlphaFold and its parameters. (This pipeline was tested with AlphaFold 2.1.0.) No need to download PDB and the protein databases.
Clone this repository:

git clone https://github.com/v-mikhaylov/tfold-release.git

Enter the tfold-release folder.

Install the dependencies. With conda, you should be able to create an environment that would work for both TFold pipeline and AlphaFold:

conda env create --file tfold-env.yml
conda activate tfold-env
pip install --upgrade jax==0.2.24 jaxlib==0.1.69+cuda111 -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

(This environment for running AlphaFold outside of Docker is due to https://github.com/kalininalab/alphafold_non_docker.)

Download the data file data.tar.gz with templates and other information from Zenodo:

https://zenodo.org/record/7700748#.ZAV0sy-B23x

and unpack it into the tfold-release folder. This will create a folder data.

Set paths to a couple folders in tfold/config.py and tfold_patch/tfold_config.py.
That should be it.

#Model pMHCs

Prepare an input file. An example can be found in data/examples/sample.csv. It should be a .csv file with a header and with columns pep and MHC allele or MHC sequence.

The format for MHC alleles is SpeciesId-Locus*Allele for class I and SpeciesId-LocusA*AlleleA/LocusB*AlleleB for class II. Some examples: HLA-A*02:01, H2-K*d, HLA-DRA*01:01/DRB4*01:144, H2-IEA*d/IEB*k.
For class II, the MHC sequence should contain alpha-chain and beta-chain sequences separated by '/'.
For more details and options, please see details.ipynb.

Activate conda environment:

conda activate tfold-env

Choose an output folder $working_dir and run the script as follows:

model_pmhcs.sh $input_file $working_dir [-d YYYY-MM-DD]

Here [-d YYYY-MM-DD] is an optional cutoff on the allowed template dates.

The models will be saved in $working_dir/outputs$ , with a separate folder for each pMHC. There will also be a summary .csv file in $working_dir with information about the best models (by predicted score).

#Details The notebook details.ipynb contains some additional details on the pipeline that can be useful e.g. for splitting the jobs over multiple GPUs. It also contains a description of our cleaned pMHC and TCR structure database and associated tools.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readme.md

readme.md

Files

readme.md

Latest commit

History

readme.md

File metadata and controls