Amazon Fine Food Reviews

In this project, I have used different type of Classification models(TSNE, Logistic Regression, SVC, RandomForest, Gradient Boosting Classifier, XGBClassifer, Decision Tree, Naive bayes) are performed on Amazon food reviews dataset with various methods of vectorization to classify postive and negative classes based on customers review text.

Input Data Source:

https://www.kaggle.com/snap/amazon-fine-food-reviews

Context:

This dataset consists of reviews of fine foods from amazon. The data span a period of more than 10 years, including all ~500,000 reviews up to October 2012. Reviews include product and user information, ratings, and a plain text review. It also includes reviews from all other Amazon categories.

Data includes:

Reviews from Oct 1999 - Oct 2012 568,454 reviews 256,059 users 74,258 products 260 users with > 50 reviews

Attribute Information:

	1.	Id - Unique Id

	2.	ProductId - unique identifier for the product

	3.	UserId - unqiue identifier for the user

	4.	ProfileName

	5.	HelpfulnessNumerator - number of users who found the review helpful

	6.	HelpfulnessDenominator - number of users who indicated whether they found the review helpful or not

	7.	Score - rating between 1 and 5

	8.	Time - timestamp for the review

	9.	Summary - brief summary of the review

	10.	Text - text of the review

Objective:

Given a review, determine whether the review is positive or negative.

Determine if the review is postive or negative: We could use the Rating/score. A rating of 4 or 5 could be considered as positive review. A review of 1 or 2 could be considered as negative review and rating of 3 is considered as neutral and ignored. This is an approximate and proxy way of determining the polarity (positivity/negativity) of a review.

MY APPROCH:

Importing the required libraries and reading the dataset.

a. Merging of the two datasets

b. Understanding the dataset
Exploratory Data Analysis (EDA) –

a. Data Visualization
Feature Engineering

a. Data Cleaning: Deduplication

b. Removal of null values

c. Text Preprocessing

d. Stemming, Stop-word removal and Lemmatization

e. Bow of Words

f. Word2Vector

g. TF-IDF Vectorizer

h. Avg W2V, TFIDF-W2V

i. T-SNE Analysis
Model Building

a. Performing train test split - Time Series Split

b. Logistic Regression Model

c. KNN Classifier Model

d. Naive Bayes Model

e. Support Vector Machine Model

f. Decision Tree Classifier

g. Random Forest Classifier
Hyperparameter Tuning (GridSearchCV and RandomSearch CV)

a. For all type of Classifier Models for different type of featurizations(BOW, TF-IDF, AVG W2V)
Model Validation

a. Accuracy score

b. Confusion matrix

c. Area Under Curve (AUC)

d. F1-score
Creating the final model and making predictions
Conclusion

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
Amazon_Food_Reviews_with_DecisionTree_model.ipynb		Amazon_Food_Reviews_with_DecisionTree_model.ipynb
Amazon_Food_Reviews_with_KNN_model.ipynb		Amazon_Food_Reviews_with_KNN_model.ipynb
Amazon_Food_Reviews_with_LogisticRegression_model.ipynb		Amazon_Food_Reviews_with_LogisticRegression_model.ipynb
Amazon_Food_Reviews_with_NaiveBayes_model.ipynb		Amazon_Food_Reviews_with_NaiveBayes_model.ipynb
Amazon_Food_Reviews_with_RandomForest_GradientBoosting_XGB_models.ipynb		Amazon_Food_Reviews_with_RandomForest_GradientBoosting_XGB_models.ipynb
Amazon_Food_Reviews_with_SVC _model.ipynb		Amazon_Food_Reviews_with_SVC _model.ipynb
README.md		README.md
Time_Series_Split.ipynb		Time_Series_Split.ipynb
WeightedTfidfW2V.pickle		WeightedTfidfW2V.pickle
amazon-food-reviews-eda-with-tsne-model.ipynb		amazon-food-reviews-eda-with-tsne-model.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Amazon Fine Food Reviews

Input Data Source:

Context:

Contents:

Data includes:

Attribute Information:

Objective:

MY APPROCH:

About

Releases

Packages

Languages

dasari-mohana-zz/Amazon_Food_Reviews_Project

Folders and files

Latest commit

History

Repository files navigation

Amazon Fine Food Reviews

Input Data Source:

Context:

Contents:

Data includes:

Attribute Information:

Objective:

MY APPROCH:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages