This is the repository of a senior project titled Essential Protein Prediction using Graph Neural Networks. Project investigates state-of-the-art Graph Neural Network models that fits to essentiality prediction. GNN models utilized are node2vec, GraphSAGE and two diffusion based GNNs namely GRAND and BLEND. Other computational, topological etc. methods are provided to see the progress clearer. XGBoost is used as the classification algorithm for unsupervised models. Use of several biological information sources such as gene expressions, go annotations to enhance the prediction is also analyzed. Relevant materials including code, data, documents etc. are all published to this repository.
git clone https://github.com/Saydemr/EPPuGNN.git
conda env create -f environment.yml
conda activate eppugnn
This might fail with the error saying pip
could not find torch version with 1.11.0+cu113
. Then, remove all the lines below from environment.yml
and install them manually with pip
.
- pykeops==2.1
- ogb==1.2.1
- torch==1.11.0+cu113
- torch-cluster==1.6.0
- torch-geometric==2.0.3
- torch-scatter==2.0.9
- torch-sparse==0.6.13
- torch-spline-conv==1.2.1
- torchdiffeq==0.2.3
python update.py
If you see anything that points to an error, you can download the missing files from here. Downloaded files must be placed under ./data
directory before running the next commands.
Biological data are obtained from BioGRID, COMPARTMENTS, and NHI GEO databases. Links to obtain files can be found inside the script.
Preprocessor takes organism name as an argument. It compiles necessary information for the given organism and saves it under ./data
directory. If you want to create data for all organisms, you can run the following command.
cd ./data
python compose_data.py --organism all
If you want to create data for a specific organism, you can run the following command.
cd ./data
python compose_data.py --organism sc
Outputs to be used in each GNN will be placed under respective directories.
For now, refer to the GitHub pages of the each GNN. This part will be updated after the automated pipeline is considered ready. We forked said GNNs to integrate some necessary features missing in the original repositories.
- Supervisor: Dr. Emre Sefer
- Institution: Ozyegin University
- We would like to thank Esad Simitcioglu, OzU AI Labs and Dr. Reyhan Aydoğan for providing on-demand hardware equipment.