Domain-Identification

NLP Applications (Spring, 2019) Course Project.

NLA_Final_Report.pdf is the Report for this project.

Domain_Identification_Presentation.pptx is for the Presentation for this project.

Data is present in data/folder. (This is smaller dataset of ~5k news articles.)

Larger dataset link: [https://drive.google.com/open?id=19XEP1zoZVIyhtglttHgg_Xz9uDB_uXz7]

Outputs can be seen in the notebook as explained below, or even in the report.

Create a directory pretrained_embeds/ in the same directory as this repo. Download glove embeddings from http://nlp.stanford.edu/data/glove.6B.zip Unzip it and place file glove.6B/ in pretrained_embeds/ directory.

Important Files

EDA.ipynb: Some analysis about the dataset.

Classifier-svm, lr.ipynb: Contains code for SVM and Logistic Regression Classifier.

Domain-ClassificationV2.ipynb Conatains the code for pre-processing, tokenizing, Bi-LSTM model for both English and Hindi (translated). Go to Prediction section and run make_prediction function on an article to predict.

make_prediction( article='', true_category='', needTranslation=False, verbose=True)

Test_Input.ipynb: Contains code for Bi-LSTM model with attention. Run make_pred function for prediction and domain based keyword extraction from an article.

Attention Model

Please note for all the code to run smoothly, make sure pre-trained glove embeddings are placed at the right place as mentioned above.

Run attention/train.py file to train the model if required with any other dataset.

Trained model is saved in the same attention/ directory with the file name attention_model.pt.

model.py inside attention/ has the code for the definition of the attention model.

Paramteres like embedding size, hidden state size, can be changed from trian.py file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Domain-Identification

Important Files

Attention Model

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
attention		attention
data		data
modelsV3		modelsV3
Classifier-svm, lr.ipynb		Classifier-svm, lr.ipynb
Domain-ClassificationV2.ipynb		Domain-ClassificationV2.ipynb
Domain_Identification_Presentation.pptx		Domain_Identification_Presentation.pptx
EDA.ipynb		EDA.ipynb
NLA_Final_Report.pdf		NLA_Final_Report.pdf
README.md		README.md

viv1729/Domain-Identification-with-Keyword-Extraction

Folders and files

Latest commit

History

Repository files navigation

Domain-Identification

Important Files

Attention Model

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages