Skip to content

Latest commit

 

History

History
68 lines (43 loc) · 2.89 KB

README.md

File metadata and controls

68 lines (43 loc) · 2.89 KB

fast-aug - python bindings

Python Test Workflow Status PyPI - Version GitHub License

fast-aug is a library for fast text augmentation, available for both Rust and Python as fast-aug.
It is designed with focus on performance and real-time usage (e.g. during training), while providing a wide range of text augmentation methods.

Note: x25 times faster than nlpaug!


Installation

fast-aug is available on PyPI.

pip install fast-aug

Usage

from fast_aug.text import CharsRandomSwapAugmenter

text_data = "Some text!"
augmenter = CharsRandomSwapAugmenter(
    0.5,  # probability of words selection
    0.5,  # probability of characters selection
    None,  # stopwords
)
assert augmenter.augment(text_data) != text_data
assert augmenter.augment_batch([text_data]) != [text_data]

TBA

Performance Comparison

Comparison of the fast-aug library with the other NLP augmentation libraries.

  • fast-aug - this, Fast Augmentation library written in Rust, with Python bindings
  • nlpaug - nlpaug - The most popular NLP augmentation library
  • fasttextaug - fasttextaug - re-write of some nlpaug's augmenters in Rust with Python bindings
  • augly not included as "Our text augmentations use nlpaug as their backbone"
  • augmenty not included as it is too slow (2-8 times slower than nlpaug)

It is end-to-end comparison, including dataset loading, classes initialization and augmentation of all samples (one-by-one or provided as a list).
See ./benchmarks/compare_text.py for details of the comparison.

comparison time comparison memory

All libs compared on tweeteval dataset - sentiment test set - 12k samples.
Note: dataset text file size is 1.1Mb, it is included in the memory usage.

Contributing and Development

Any contribution is warmly welcomed!
Please see the GitHub repository README at fast-aug.