After session 2, you should have the data as below:
- cleaned_data.csv
- context_data.csv
- annotations_data.csv
- hashtags_data.csv
- mentions_data.csv
- Named Entity Recognition and Descriptive Analysis
- Sentiment analysis
- Visualization
What is NER?
- Named entity recognition is a natural language processing technique that can automatically scan entire articles and pull out some fundamental entities in a text and classify them into predefined categories.
Other NER tools:
What are context annotations and entities?
Tweet context annotations offer a way to understand contextual information about the Tweet itself. Though 100% of Tweets are reviewed, due to the contents of Tweet text, only a portion are annotated.
The context annotations is derived from the analysis of a Tweet’s text and will include a domain and entity pairing which can be used to discover Tweets on topics that may have been previously difficult to surface. At present, there is a list of 50+ domains to categorize Tweets.
Entity annotations: Entities are comprised of below types. Entities are delivered as part of the entity payload section. They are programmatically assigned based on what is explicitly mentioned in the Tweet text.
- Person - Barack Obama, Daniel, or George W. Bush
- Place - Detroit, Cali, or "San Francisco, California"
- Product - Mountain Dew, Mozilla Firefox
- Organization - Chicago White Sox, IBM
- Other - Diabetes, Super Bowl 50
What we can extract from the context annotations and entities data?
- data set
context_data
can tell which domain(s) the associated tweet is in. Such as Politics or Sports.- data set
entities
can tell what entities appear in the tweet(s)- Use
value_counts
to get those frequently mentioned entities- Join multiple entities by
tweet_id
, to piece together what entities each tweet has- Join the hashtags or mentions for each tweet
- Counts the frequency of hashtags and mentions
- install
pip install wordcloud
for plotting wordcloud- install
pip install networkx
, in case bug reported, trypip install networkx==2.6.3
What is sentiment analysis?
- Identify, extract, quantify, and study affective states and subjective information.
Rule-based Model
- A set of rules based on which the text is labeled as positive/negative/neutral
- Packages:
- nltk-SentimentIntensityAnalyzer
- TextBlob
- VADER Compound Score range: [-1~1]
- TextBlob Score range: polarity[-1.0,1.0], subjectivity[0.0,1.0]
Pretrained Neural Network
- The model is trained with real data.
- flair
Commercialized Model
Train the Model in Specific Domain
Project Breakdown
- Session1: Data Collection
- Session2: Data Clean and Preparation
- Session3: Data Analysis and Modeling
flowchart TD
A[Data Collection with Twitter API] --> B[Data Clean and Preparation];
B --> C[Data Analysis and Modeling];