Starting with just compound splitting and removing sandhi
A very simple char seq2seq transformer model is tested for Sanskrit Segmentation (removing sandhi only). More work needs to be done.
If you have a GPU:
virtualenv venv
source venv/bin/activate
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip3 install tqdm wandb pandas BeautifulSoup4 lxml
CPU Only:
virtualenv venv
source venv/bin/activate
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
pip3 install tqdm wandb pandas BeautifulSoup4 lxml
chmod +x fetch_data.sh
./fetch_data.sh
python3 prepare_dataset.py
python3 train.py
Some code was taken from the following repository. See License/ {MIT License}