Skip to content

lu876/BERT-methylation

Repository files navigation

BERT-methylation

This repo is the implementation of "".

Environment dependent

  • python == 3.6
  • matplotlib==3.5.1
  • numpy==1.23.2
  • openpyxl==3.0.9
  • pandas==1.4.3
  • scikit_learn==1.1.2
  • seaborn==0.11.2
  • torch==1.10.2+cu102
  • transformers==4.17.0
  • xgboost==1.6.1

Sequence of executing programs and makefiles:

Kindly follow the sequence list below to run our code.

  1. Run word2vec.py: The program loads the methylation dataset from the folder and trains a Word2Vec network. The program will generate the file "net_word2vec.pth".
  2. Run extract word.py: This program will read the vocabulary used in the BERT pre-training and load the pre-trained BERT framework. The program filters English characters, edits the input format, and outputs a vector corresponding to English. This file will write the output to BERT_vec.xlsx.
  3. Run compare.py: The program will read the BERT_vec.xlsx, extracting a vector for each English word. Load the trained Word2Vec model parameters, compute the vector representation of each 5 mer. 'DNA_Eng.csv' is generated by calculating the cosine similarity between each 5 mer and each English word vector.
  4. Run fine-tuning.py: This program is a fine-tuning program that reads a DNA methylation dataset. To reproduce our experimental results, we provide the 'DNA_Eng.csv' we used in our experiments. The fine-tuned network parameters are saved in 'Bestmodel_.pth'.
  5. Tsne_show.py: Load the fine-tuned model and test data, and compute the new vector representation of the test set. The distribution of the test set is drawn by the T-SNE dimensionality reduction method.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages