Skip to content

Latest commit

 

History

History
20 lines (11 loc) · 1.01 KB

README.md

File metadata and controls

20 lines (11 loc) · 1.01 KB

nlp-sentimentAnalysis

Logistic Regression model to predict sentiment on sentences in a corpus and displays top k features

Input: sentence, 0 or 1. per line in document.

We create tuples of (sentence, 0 or 1). 0 for negative 1 for positive sentiment.

Special case for words that begin, end, or are between a single quote '. We match them with a regex pattern and handle by adding 'EDIT_' token in front of word.

We also have to match negative words using regex and tag these tokens with 'NOT_'. We do this after we encounter a negation token and until we encounter an end negation token.

We then transform these features into vector format from scratch to build X train matrix, Y train label vector, and X test matrix.

We normalize the X_train matrix and X_test matrix seperately.

We chose a Logistic Regression Model to train and predict the test set.

We used the following evaluation scores for predictions: Precision, Recall, and Fmeasure.

There is also a function at the end to display top K features for a trained model.