supervised classification using Multinomial Naive Bayes & Randon Forest
classification_v1.py
implemented Bag of words, TF-IDF(with normalization)weighting, removed stop-words for feature engineering. classification_v2.py
uses sklearn to process text.
used cross-validation's split_train_test() to generate hold out test set
print out performance metrics includes: accuracy, confusion matrix
plot ROC curve with Matplotlib
How to run?
$ python classfition_v1.py