WeRateDogs : A Data Wrangling Project

by Nonso Udechukwu

Introduction

This data wrangling project is the second project of the ALX-T Data Analyst Nanodegree programme on Udacity. The project focused on wrangling and analysing data from @WeRateDogs Twitter account.

This involved:

gathering data from multiple sources (downloaded @WeRateDogs' Twitter archive dataset and image predictions data using the Requests library, and queried Twitter API using tweepy for additional data),
assessing data (visually and programmatically),
cleaning and merging the datasets, and then
performing analysis on the tweets to extract insights.

Libraries Used

Pandas: For storing and manipulating structured data.
Numpy: For multi-dimensional array, matrix data structures and, performing mathematical operations
Matplotlib: For all visualizations (including maps and graphs)
Seaborn: a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.
tweepy: open-sourced, easy-to-use for accessing the Twitter API. It gives you an interface to access the API from your Python application.
requests: allows you to send HTTP requests using Python.
os: For interacting with the operating system
json: For parsing JSON into a Python dictionary or list. It can also convert Python dictionaries or lists into JSON strings.
re: provides a set of powerful regular expression facilities, which allows one to quickly check whether a given string matches a given pattern (using the match function), or contains such a pattern (using the search function).

Project Methodology

The main steps for this project are as follows:

Data Wrangling:
- Data Gathering
- Data Assessment
- Data Cleaning
Analysis and Visualisation
Conclusions/Results

Key Insights

Based on the data and analysis carried out, I found that:

There's a very positive correlation (r=0.93) between retweets and favourite count of a tweet.
Tweets of the doggo-puppo category overwhelmingly outperformed the rest in retweets and likes. @WeRateDogs might focus on this category for future tweets, given its apparent popularity with the followers.
Tweeting from a web browser seems to have gathered more retweets on average. It is important to note that other factors (such as tweet content) are likely contributing here.

Limitations

84% (1561/1851) of the dogs didn't have a category. This means conclusions about dog stage were made on a very small portion of the observations.
A tweet source "Vine - Make a Scene" wasn't part of the final analysis. I suspect this tweet source was dropped when I dropped retweet rows and tweets with nonstandard dog names.
I couldn't query additional data on 29 tweets. 28 of the tweets threw up "No status found with that ID" error, while one threw up "Sorry, you are not authorized to see this status" error.

References

Below are some of the websites I consulted for this project:

Pandas Combine Two Columns of Text in DataFrame, SparkByExamples, Website Link
How to Convert Text Data from Requests object to DataFrame, StackOverflow, Website Link
BeautifulSoup: Extract Text from Anchor Tag, StackOverflow, Website Link
Authenticate the Twitter API with Python (Tweepy), JC Chouinard, Website Link
How to Make a Table in Jupyter Notebook, CodeGrepper, Website Link

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
act_report.html		act_report.html
act_report.ipynb		act_report.ipynb
image-predictions.tsv		image-predictions.tsv
tweet_json.txt		tweet_json.txt
twitter-archive-enhanced.csv		twitter-archive-enhanced.csv
twitter_archive_master.csv		twitter_archive_master.csv
wrangle_act.ipynb		wrangle_act.ipynb
wrangle_report.html		wrangle_report.html
wrangle_report.ipynb		wrangle_report.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WeRateDogs : A Data Wrangling Project

by Nonso Udechukwu

Introduction

Libraries Used

Project Methodology

Key Insights

Limitations

References

About

Releases

Packages

Languages

OsyTheDataGuy/WeRateDogs-Twitter-Data-Wrangling-Project-DAND-Project-2

Folders and files

Latest commit

History

Repository files navigation

WeRateDogs : A Data Wrangling Project

by Nonso Udechukwu

Introduction

Libraries Used

Project Methodology

Key Insights

Limitations

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages