Skip to content

Latest commit

 

History

History
55 lines (43 loc) · 2.75 KB

readme.md

File metadata and controls

55 lines (43 loc) · 2.75 KB

Please use repository tfold-release for the TFold package! (Present repository includes notebooks that were used in the data analysis for the paper, and which are not needed to run TFold.)

#AlphaFold-based pipeline for prediction of peptide-MHC structures. Please cite as:
Victor Mikhaylov and Arnold J. Levine, "Accurate modeling of peptide-MHC structures with AlphaFold", to appear.

#Download and install

  1. Download AlphaFold and its parameters. (This pipeline was tested with AlphaFold 2.1.0.) No need to download PDB and the protein databases.

  2. Clone this repository:

git clone https://github.com/v-mikhaylov/tfold-release.git

Enter the tfold-release folder.

  1. Install the dependencies. With conda, you should be able to create an environment that would work for both TFold pipeline and AlphaFold:
conda env create --file tfold-env.yml
conda activate tfold-env
pip install --upgrade jax==0.2.24 jaxlib==0.1.69+cuda111 -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

(This environment for running AlphaFold outside of Docker is due to https://github.com/kalininalab/alphafold_non_docker.)

  1. Download the data file data.tar.gz with templates and other information from Zenodo:
https://zenodo.org/record/7700748#.ZAV0sy-B23x

and unpack it into the tfold-release folder. This will create a folder data.

  1. Set paths to a couple folders in tfold/config.py and tfold_patch/tfold_config.py.

  2. That should be it.

#Model pMHCs

  1. Prepare an input file. An example can be found in data/examples/sample.csv. It should be a .csv file with a header and with columns pep and MHC allele or MHC sequence.
  • The format for MHC alleles is SpeciesId-Locus*Allele for class I and SpeciesId-LocusA*AlleleA/LocusB*AlleleB for class II. Some examples: HLA-A*02:01, H2-K*d, HLA-DRA*01:01/DRB4*01:144, H2-IEA*d/IEB*k.
  • For class II, the MHC sequence should contain alpha-chain and beta-chain sequences separated by '/'.
  • For more details and options, please see details.ipynb.
  1. Activate conda environment:
conda activate tfold-env
  1. Choose an output folder $working_dir and run the script as follows:
model_pmhcs.sh $input_file $working_dir [-d YYYY-MM-DD]

Here [-d YYYY-MM-DD] is an optional cutoff on the allowed template dates.

  1. The models will be saved in $working_dir/outputs$, with a separate folder for each pMHC. There will also be a summary .csv file in $working_dir with information about the best models (by predicted score).

#Details The notebook details.ipynb contains some additional details on the pipeline that can be useful e.g. for splitting the jobs over multiple GPUs. It also contains a description of our cleaned pMHC and TCR structure database and associated tools.