Sanskrit Segmentation Using Transformers

Starting with just compound splitting and removing sandhi

Status

A very simple char seq2seq transformer model is tested for Sanskrit Segmentation (removing sandhi only). More work needs to be done.

Install Required Packages

If you have a GPU:

virtualenv venv
source venv/bin/activate
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip3 install tqdm wandb pandas BeautifulSoup4 lxml

CPU Only:

virtualenv venv
source venv/bin/activate
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
pip3 install tqdm wandb pandas BeautifulSoup4 lxml

Prepare Dataset

chmod +x fetch_data.sh
./fetch_data.sh
python3 prepare_dataset.py

Train Model

python3 train.py

Note:

Some code was taken from the following repository. See License/ {MIT License}

https://github.com/aladdinpersson/Machine-Learning-Collection/tree/master/ML/Pytorch/more_advanced/seq2seq_transformer

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
License		License
data/gretil_possiblywith_sandhisplits		data/gretil_possiblywith_sandhisplits
vidyut		vidyut
.gitignore		.gitignore
Readme.md		Readme.md
fetch_data.sh		fetch_data.sh
model.py		model.py
prepare_dataset.py		prepare_dataset.py
tokenizer.py		tokenizer.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sanskrit Segmentation Using Transformers

Status

Install Required Packages

Prepare Dataset

Train Model

Note:

About

Releases

Sponsor this project

Packages

Languages

ambuda-org/cheda-ml-temp

Folders and files

Latest commit

History

Repository files navigation

Sanskrit Segmentation Using Transformers

Status

Install Required Packages

Prepare Dataset

Train Model

Note:

About

Resources

Stars

Watchers

Forks

Releases

Sponsor this project

Packages 0

Languages

Packages