Skip to content

SvarnimN/nepalbhasa-corpus

 
 

Repository files navigation

nepalbhasa-corpus

Nepal bhasa text corpus for NLP.

Text is in jsonlines format, each line is a post.

File key

  • _raw.jsonl = original scraped text in devanagari script
  • _clean.jsonl = cleaned up version in devanagari script
  • _newa.jsonl = cleaned up version converted to Prachalit (Newa) script

Source

Scraped from these nepal bhasa news portals

  • nepalbhasatimes.com
  • nepalmandal.com

Scraped without explicit permission. To be used for betterment of Nepal Bhasa lingustic models and tools

About

Nepal bhasa text corpus

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published