GitHub - v-mikhaylov/tfold-full

Please use repository tfold-release for the TFold package! (Present repository includes notebooks that were used in the data analysis for the paper, and which are not needed to run TFold.)

#AlphaFold-based pipeline for prediction of peptide-MHC structures. Please cite as:
Victor Mikhaylov and Arnold J. Levine, "Accurate modeling of peptide-MHC structures with AlphaFold", to appear.

#Download and install

Download AlphaFold and its parameters. (This pipeline was tested with AlphaFold 2.1.0.) No need to download PDB and the protein databases.
Clone this repository:

git clone https://github.com/v-mikhaylov/tfold-release.git

Enter the tfold-release folder.

Install the dependencies. With conda, you should be able to create an environment that would work for both TFold pipeline and AlphaFold:

conda env create --file tfold-env.yml
conda activate tfold-env
pip install --upgrade jax==0.2.24 jaxlib==0.1.69+cuda111 -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

(This environment for running AlphaFold outside of Docker is due to https://github.com/kalininalab/alphafold_non_docker.)

Download the data file data.tar.gz with templates and other information from Zenodo:

https://zenodo.org/record/7700748#.ZAV0sy-B23x

and unpack it into the tfold-release folder. This will create a folder data.

Set paths to a couple folders in tfold/config.py and tfold_patch/tfold_config.py.
That should be it.

#Model pMHCs

Prepare an input file. An example can be found in data/examples/sample.csv. It should be a .csv file with a header and with columns pep and MHC allele or MHC sequence.

The format for MHC alleles is SpeciesId-Locus*Allele for class I and SpeciesId-LocusA*AlleleA/LocusB*AlleleB for class II. Some examples: HLA-A*02:01, H2-K*d, HLA-DRA*01:01/DRB4*01:144, H2-IEA*d/IEB*k.
For class II, the MHC sequence should contain alpha-chain and beta-chain sequences separated by '/'.
For more details and options, please see details.ipynb.

Activate conda environment:

conda activate tfold-env

Choose an output folder $working_dir and run the script as follows:

model_pmhcs.sh $input_file $working_dir [-d YYYY-MM-DD]

Here [-d YYYY-MM-DD] is an optional cutoff on the allowed template dates.

The models will be saved in $working_dir/outputs$ , with a separate folder for each pMHC. There will also be a summary .csv file in $working_dir with information about the best models (by predicted score).

#Details The notebook details.ipynb contains some additional details on the pipeline that can be useful e.g. for splitting the jobs over multiple GPUs. It also contains a description of our cleaned pMHC and TCR structure database and associated tools.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
tfold		tfold
tfold_patch		tfold_patch
.gitignore		.gitignore
LICENSE		LICENSE
analyze_binding_registers.ipynb		analyze_binding_registers.ipynb
analyze_geometry.ipynb		analyze_geometry.ipynb
collect_results.py		collect_results.py
create_structure_database.ipynb		create_structure_database.ipynb
details.ipynb		details.ipynb
make_msas.ipynb		make_msas.ipynb
model_pmhcs.py		model_pmhcs.py
model_pmhcs.sh		model_pmhcs.sh
model_structures.ipynb		model_structures.ipynb
msas.tar.gz		msas.tar.gz
pmhc_nn.ipynb		pmhc_nn.ipynb
preprocess_pmhc_assays.ipynb		preprocess_pmhc_assays.ipynb
readme.md		readme.md
test.ipynb		test.ipynb
tfold-env.yml		tfold-env.yml
tfold_msa_tools.py		tfold_msa_tools.py
tfold_run_alphafold.py		tfold_run_alphafold.py
tmp.ipynb		tmp.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

License

v-mikhaylov/tfold-full

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages