Skip to content

Latest commit

 

History

History
21 lines (15 loc) · 1.13 KB

README.md

File metadata and controls

21 lines (15 loc) · 1.13 KB

Hindi Text Analysis: NLP-Powered Insights & Sentiment Detection

This project preprocesses Hindi text using the IndicNLP library for normalization and tokenization. A custom tokenizer enhances this process by cleaning text, removing stop words, and handling language-specific nuances.

Key Steps:

  1. Text Preprocessing:

    • Normalize and tokenize Hindi text with IndicNLP.
    • Clean the text and remove stop words using a custom tokenizer.
  2. Feature Extraction:

    • Apply TF-IDF vectorization with bigrams to extract key terms and phrases.
    • Capture the semantic structure of dialogues.
  3. Sentiment Analysis:

    • Utilize a labeled Hindi word list to determine sentiment scores.
    • Analyze emotional tones for individual speakers and the overall conversation.
  4. Conversation Insights:

    • Summarize key themes and interaction dynamics using extracted terms and sentiment analysis.

This pipeline provides a structured approach to analyzing Hindi conversations, making it useful for linguistic research, sentiment analysis, and dialogue summarization. 🚀