BERT-methylation

This repo is the implementation of "".

Environment dependent

Kindly follow the sequence list below to run our code.

Run word2vec.py: The program loads the methylation dataset from the folder and trains a Word2Vec network. The program will generate the file "net_word2vec.pth".
Run extract word.py: This program will read the vocabulary used in the BERT pre-training and load the pre-trained BERT framework. The program filters English characters, edits the input format, and outputs a vector corresponding to English. This file will write the output to BERT_vec.xlsx.
Run compare.py: The program will read the BERT_vec.xlsx, extracting a vector for each English word. Load the trained Word2Vec model parameters, compute the vector representation of each 5 mer. 'DNA_Eng.csv' is generated by calculating the cosine similarity between each 5 mer and each English word vector.
Run fine-tuning.py: This program is a fine-tuning program that reads a DNA methylation dataset. To reproduce our experimental results, we provide the 'DNA_Eng.csv' we used in our experiments. The fine-tuned network parameters are saved in 'Bestmodel_.pth'.
Tsne_show.py: Load the fine-tuned model and test data, and compute the new vector representation of the test set. The distribution of the test set is drawn by the T-SNE dimensionality reduction method.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
DNA_Eng.csv		DNA_Eng.csv
LICENSE		LICENSE
README.md		README.md
Tsne_plot.py		Tsne_plot.py
compare.py		compare.py
explanation.py		explanation.py
extract word.py		extract word.py
feature_extraction.py		feature_extraction.py
fine-tuning.py		fine-tuning.py
word2vec.py		word2vec.py