Sentiment Detection Project

A joint project with my fellow student, a part of the course in Textual Data Analysis, is devoted to building a sentiment detection system that is a common task in Natural Language Processing. The project is composed of three milestones.

Milestone I

Including:

training sentiment classifiers on two datasets coming from different domains
reporting performance of classifiers in terms of accuracy/precision/recall/F-score on the respective test sets
performing a small qualitative assessment of the mistakes made by the classifiers.

Milestone II

Including:

testing classifiers on datasets opposite to ones used for training
reproting gained results and drawing conclusions about the transferability of trained sentiment classifiers across different domains

Milestone III

Including:

translating one of the datasets used for training the sentiment classifier in the Russian language
training a new sentiment classifier on produced dataset
evaluating gained results both qualitatively and quanititatively

Data used

Sentiment Labelled Sentences Data Set

The datasets contains sentences extracted from reviews of products, movies, and restaurants and labelled with positive or negative sentiment. The sentences are derived from three different websites

imdb.com (movies reviews)
amazon.com (product reviews, mostly mobile phones, headsets, and some other phone accessories)
yelp.com (restaurants reviews)

For each website, there are 500 positive and 500 negative sentences that ensure the dataset to be balanced. Altogether the dataset consists of 3,000 instances. Thus it's a medium sized which allows to browse the whole dataset as needed.

IMDB reviews (Large Movie Review Dataset)

The dataset provides sufficient data for both positive and negative sentiments which are uniformly distributed (label-balanced). Length of individual reviews is longer (than a single sentence like in Sentiment Labelled Sentences Data Set). Already known in NLP & ML community. Suitable for binary sentiment classification.

Please head over to data folder for datasets used in the project.

Packages used

python 3.7.3
re 2.2.1
numpy 1.16.3
pandas 0.24.2
matplotlib 3.0.3
seaborn 0.9.0
scikit-learn 0.20.3
eli5 0.8.2

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
.gitignore		.gitignore
README.md		README.md
TDA19_project_sentiment_analysis.ipynb		TDA19_project_sentiment_analysis.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment Detection Project

Milestone I

Milestone II

Milestone III

Data used

Packages used

About

Releases

Packages

Languages

aleksandr-krylov/sentiment-detection-project

Folders and files

Latest commit

History

Repository files navigation

Sentiment Detection Project

Milestone I

Milestone II

Milestone III

Data used

Packages used

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages