fast-aug
is a library for fast text augmentation, available for both Rust and Python as fast-aug
.
It is designed with focus on performance and real-time usage (e.g. during training), while providing a wide range of text augmentation methods.
Note: x25 times faster than nlpaug
!
fast-aug
is available on PyPI.
pip install fast-aug
from fast_aug.text import CharsRandomSwapAugmenter
text_data = "Some text!"
augmenter = CharsRandomSwapAugmenter(
0.5, # probability of words selection
0.5, # probability of characters selection
None, # stopwords
)
assert augmenter.augment(text_data) != text_data
assert augmenter.augment_batch([text_data]) != [text_data]
TBA
Comparison of the fast-aug
library with the other NLP augmentation libraries.
fast-aug
- this, Fast Augmentation library written in Rust, with Python bindingsnlpaug
- nlpaug - The most popular NLP augmentation libraryfasttextaug
- fasttextaug - re-write of somenlpaug
's augmenters in Rust with Python bindingsaugly
not included as "Our text augmentations use nlpaug as their backbone"augmenty
not included as it is too slow (2-8 times slower thannlpaug
)
It is end-to-end comparison, including dataset loading, classes initialization and augmentation of all samples (one-by-one or provided as a list).
See ./benchmarks/compare_text.py for details of the comparison.
All libs compared on tweeteval dataset - sentiment test set - 12k samples.
Note: dataset text file size is 1.1Mb, it is included in the memory usage.
Any contribution is warmly welcomed!
Please see the GitHub repository README at fast-aug.