Skip to content

anuragjain0610/Poems-Genre-Predictor-NLP-

Repository files navigation

Poems-Genre-Predictor-NLP-

Natural Language Processing: Predicting Genres of Poems

Poems Classifier

Data Source: https://www.kaggle.com/ultrajack/modern-renaissance-poetry

The dataset consists of the following data:

  1. Total no. of data points: 573
  2. Number of Attributes/Columns in data: 5
  3. Number of Authors: 67
  4. Number of different poems: 506
  5. Number of different poem names: 509
  6. Types(genres): 3

Attribute Information:

  1. author
  2. content: Poems
  3. poem Name
  4. age: Modern and Renaissance
  5. type: Genre(Mythology & Folklore, Nature and Love)

Loading the data: The dataset is available in .csv file.

Approach: In steps:-

  1. Removing the duplicate and Null Data.
  2. Each poem is converted to lower case.
  3. Removal of punctuations like --->,/./‘/: etc
  4. Removal of unimportant words like “A”,"An","The","Aboard","About","Above","Absent","Across","After", etc
  5. Vectorizing using TFIDF Vectorizer.
  6. Splitting the data into the 80/20 ratio.
  7. The first classifier used-----> XGBoost
  8. Second Classification using SVM

Packages used:

  1. Numpy
  2. Pandas
  3. Matplotlib
  4. Seaborn
  5. Regular expression--->re
  6. xgboost From Scikit-Learn:
  7. TfidfVectorizer
  8. train_test_split
  9. LabelEncoder
  10. confusion_matrix
  11. accuracy_score
  12. LinearSVC
  13. average_precision_score

About

Natural Language Processing: Predicting Genres of Poems

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published