Hindi Text Analysis: NLP-Powered Insights & Sentiment Detection

This project preprocesses Hindi text using the IndicNLP library for normalization and tokenization. A custom tokenizer enhances this process by cleaning text, removing stop words, and handling language-specific nuances.

Key Steps:

Text Preprocessing:
- Normalize and tokenize Hindi text with IndicNLP.
- Clean the text and remove stop words using a custom tokenizer.
Feature Extraction:
- Apply TF-IDF vectorization with bigrams to extract key terms and phrases.
- Capture the semantic structure of dialogues.
Sentiment Analysis:
- Utilize a labeled Hindi word list to determine sentiment scores.
- Analyze emotional tones for individual speakers and the overall conversation.
Conversation Insights:
- Summarize key themes and interaction dynamics using extracted terms and sentiment analysis.

This pipeline provides a structured approach to analyzing Hindi conversations, making it useful for linguistic research, sentiment analysis, and dialogue summarization. 🚀

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Hindi Text Analysis: NLP-Powered Insights & Sentiment Detection

Key Steps:

Files

README.md

Latest commit

History

README.md

File metadata and controls

Hindi Text Analysis: NLP-Powered Insights & Sentiment Detection

Key Steps: