A web-deployable annotation tool for distributed multi-task natural language annotation.
Features
- Sequential annotation method
- Eight categories of admission tasks with unique interfaces
- Comprehensive task coverage
This work has been accepted in 3rd Workshop on NLP Open Source Software at the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023).
You may access the paper at arXiv. More details here.
- Clone repository
pip install -r requirements.txt
- Copy
settings.sample.py
tosettings.py
and make appropriate changes. - Run application server
python server.py
Note: Sample corpus and table data is included in data/corpus/sample
and data/tables/sample
respectively.
- Task 1: Sentence Boundary
- Task 2: Canonical Token Order (a.k.a. Anvaya)
- Task 3: Token Classification (e.g. Named Entity Recognition)
- Task 4: Token Graph (e.g. Dependency Graph, Action Graph)
- Task 5: Token Connection (e.g. Co-reference Resolution)
- Task 6: Sentence Classification
- Task 7: Sentence Graph (e.g. Discourse Graph)
- Task 8: Token Text Annotation (e.g. Lemmatization)
- Download Paper: arXiv
@inproceedings{terdalkar2023antarlekhaka,
title = {{Antarlekhaka}: A Comprehensive Tool for Multi-task Natural Language Annotation},
author = {Terdalkar, Hrishikesh and Bhattacharya, Arnab},
year = {2023},
eprint = {2310.07826},
url = {https://arxiv.org/abs/2310.07826},
publisher = {Association for Computational Linguistics},
archiveprefix = {arXiv},
keywords = {Annotation Tool, Sequential Annotation, Natural Language Processing},
booktitle = {Proceedings of the 3rd Workshop on NLP Open Source Software at the 2023 Conference on Empirical Methods in Natural Language Processing},
primaryclass = {cs.CL},
numpages = {8},
location = {Singapore},
series = {NLP-OSS @ EMNLP}
}