This project was part of the course "Deep Learning with Python" where I had use a dataset of my own interest and apply a Language model. Here I tried to predict character based on the dialogue line.
The dataset I used are the dialogue lines from the 27 first seasons of "The Simpsons", accessible under the following link: https://www.kaggle.com/datasets/pierremegret/dialogue-lines-of-the-simpsons?resource=download
- Sentence embedding with BERT
- Different test for NN where I manipulate the depth and number of nodes with the aim of accuracy improvement (which was not very good)
- Clustering with hdbscan for later Topic Modeling with BERT