Skip to content

sai80082/lang-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Hindi Text Analysis: NLP-Powered Insights & Sentiment Detection

This project preprocesses Hindi text using the IndicNLP library for normalization and tokenization. A custom tokenizer enhances this process by cleaning text, removing stop words, and handling language-specific nuances.

Key Steps:

  1. Text Preprocessing:

    • Normalize and tokenize Hindi text with IndicNLP.
    • Clean the text and remove stop words using a custom tokenizer.
  2. Feature Extraction:

    • Apply TF-IDF vectorization with bigrams to extract key terms and phrases.
    • Capture the semantic structure of dialogues.
  3. Sentiment Analysis:

    • Utilize a labeled Hindi word list to determine sentiment scores.
    • Analyze emotional tones for individual speakers and the overall conversation.
  4. Conversation Insights:

    • Summarize key themes and interaction dynamics using extracted terms and sentiment analysis.

This pipeline provides a structured approach to analyzing Hindi conversations, making it useful for linguistic research, sentiment analysis, and dialogue summarization. 🚀

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published