Skip to content

RNN GC Explanation

Ben Drucker edited this page Apr 7, 2023 · 6 revisions

Background

As a first step in adding an LSTM to DeepKS, I've created a "toy" LSTM project for a different task: The group classifier. The group classifier is one component of DeepKS. The data flow is as follows in DeepKS:

Currently, the group classifier is implemented with a simple MLP, as shown in the diagram. The group classifier takes in a kinase sequence and produces a prediction of the kinase's group (there are 10 groups total). As it currently stands, the input features are sequence alignment distances to a fixed set of known kinases.

Motivation

It seemed fairly natural for one to re-implement the whole group classifier with an RNN, especially as a toy example of what could be done for the main, complex, neural network portion of DeepKS.

RNN

Now, the sequences are embedded using the torch.nn.Embedding layer, letter by letter. (We are thinking of each amino-acid-letter as a word in a sentence). So this could be a plausible embedding (based on the learned weights inside the Embedding):

AA Position: 1 2 3 4 5 6 7 8 9 10 11 12 ...
Amino Acid: A A C T Y G M V H I I I ...
Emb. dim 1: 2.34 2.34 3.99 0.55 -0.11 3.14 2.22 0.51 -3.3 5.62 5.62 5.62 ...
Emb. dim 2: 1.99 1.99 -1.4 -1.5 -4.43 22.0 -0.98 0.53 -2.1 4.44 4.44 4.44 ...
... ... ... ... ... ... ... ... ... ... ... ... ... ...

Instructions for running

To run this, it's probably not necessary to go through docker. Just clone the repository and pip install -r requirements.txt. This does not include torch, as each system may need a different version. If you don't already have it, install torch from here: https://pytorch.org/get-started/locally/.

To run the LSTM model, first switch to the development branch (git switch dev). Then use python -m DeepKS.models.RNN-GC. You must issue this command from DeepKS's parent folder (due to python requirement). To change device (if you have CUDA), you can do so on the first line of the main function in DeepKS/models/RNN-GC.py. The model with be trained, validated, and tested from there. A model summary will also be generated. (Note that the output shape of the LSTM is nonsensical since the model summary generator doesn't understand LSTMs' multiple types of outputs)

The code in the file should be clear to follow. It consists of several helper functions and the definition of the NN model. The core training and testing functionality arise from DeepKS/tools/NNInterface.py.

Clone this wiki locally