Twitter Hot Topics Detection

A small project to demonstrate the usage of Twitter API and NLP techniques. The idea is to download tweets from specified accounts (news companies), cluster tweets into topics, detect the hottest topic, and output the most relevant news tweet from that topic.

This is not production-ready code, more like a proof of concept.

Concepts

The project was done using the following tools and techiques:

Twitter API (python-twitter implementation)
Google Word2Vec feature generation (pre-trained vectors trained on part of Google News dataset)
k-means clustering (sklearn.cluster.KMeans)
Silhouette values to estimate the number of clusters (sklearn.metrics.silhouette_score)

Files

The code split into separate files to make debugging and testing easier.

get_tweets.py - downloads news tweets.
create_hist_dataset.py - cleans and saves dataset.
save_vectors.py - converts sentences to vectors and save result for further modelling.
detect_hot.py - prints out 'hot' tweets.

Some (dirty) exploration

There is some thought process recorded in the Jupyter Notebooks.

NLP explore.ipynb - some exploration on clustering.
Tune parameters.ipynb - some exploration on tuning heuristic parameters.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data/input		data/input
notebooks		notebooks
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Twitter Hot Topics Detection

Concepts

Files

Some (dirty) exploration

About

Releases

Packages

Languages

dzubo/twitter-hot-topics

Folders and files

Latest commit

History

Repository files navigation

Twitter Hot Topics Detection

Concepts

Files

Some (dirty) exploration

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages