Session 3

Preparation

After session 2, you should have the data as below:

cleaned_data.csv

context_data.csv

annotations_data.csv

hashtags_data.csv

mentions_data.csv

Plan for Today

Named Entity Recognition and Descriptive Analysis

Sentiment analysis

Visualization

Named Entity Recognition(NER)

What is NER?

Named entity recognition is a natural language processing technique that can automatically scan entire articles and pull out some fundamental entities in a text and classify them into predefined categories.

Other NER tools:

nltk

spaCy

Context Annotations and Entities

What are context annotations and entities?

Tweet context annotations offer a way to understand contextual information about the Tweet itself. Though 100% of Tweets are reviewed, due to the contents of Tweet text, only a portion are annotated.

The context annotations is derived from the analysis of a Tweet’s text and will include a domain and entity pairing which can be used to discover Tweets on topics that may have been previously difficult to surface. At present, there is a list of 50+ domains to categorize Tweets.

Entity annotations: Entities are comprised of below types. Entities are delivered as part of the entity payload section. They are programmatically assigned based on what is explicitly mentioned in the Tweet text.

Person - Barack Obama, Daniel, or George W. Bush

Place - Detroit, Cali, or "San Francisco, California"

Product - Mountain Dew, Mozilla Firefox

Organization - Chicago White Sox, IBM

Other - Diabetes, Super Bowl 50

More detail

Descriptive Analysis

What we can extract from the context annotations and entities data?

data set context_data can tell which domain(s) the associated tweet is in. Such as Politics or Sports.

data set entities can tell what entities appear in the tweet(s)

Use value_counts to get those frequently mentioned entities

Join multiple entities by tweet_id, to piece together what entities each tweet has

Join the hashtags or mentions for each tweet

Counts the frequency of hashtags and mentions

Visualization

install pip install wordcloud for plotting wordcloud

install pip install networkx, in case bug reported, try pip install networkx==2.6.3

Sentiment Analysis

What is sentiment analysis?

Identify, extract, quantify, and study affective states and subjective information.

Rule-based Model

A set of rules based on which the text is labeled as positive/negative/neutral

Packages:

nltk-SentimentIntensityAnalyzer

TextBlob

VADER Compound Score range: [-1~1]

TextBlob Score range: polarity[-1.0,1.0], subjectivity[0.0,1.0]

Pretrained Neural Network

The model is trained with real data.

flair

Commercialized Model

Openai

Train the Model in Specific Domain

BERT

BERTweet

Universal Sentence Encoder

Coding Example

Wrap Up

Project Breakdown

Session1: Data Collection

Session2: Data Clean and Preparation

Session3: Data Analysis and Modeling

flowchart TD
    A[Data Collection with Twitter API] --> B[Data Clean and Preparation];
    B --> C[Data Analysis and Modeling];

Loading

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Session3.md

Session3.md

Session 3

Preparation

Plan for Today

Named Entity Recognition(NER)

Context Annotations and Entities

Descriptive Analysis

Visualization

Sentiment Analysis

Wrap Up

End

Submit questions and issues here

Files

Session3.md

Latest commit

History

Session3.md

File metadata and controls

Session 3

Preparation

Plan for Today

Named Entity Recognition(NER)

Context Annotations and Entities

Descriptive Analysis

Visualization

Sentiment Analysis

Wrap Up

End

Submit questions and issues here