A TextFlows package, which contains the core classes for representing an annotated document corpus, as well as text mining widgets (UI components) based on NLTK. The package can also be used with ClowdFlows 2.0.
Currently, the project contains several components for text preprocessing: tokenization, stop word removal, lemmatization, part-of-speech tagging, etc.
Please find installation instructions, examples and API reference on Read the Docs.
Please note that this is a research project and that drastic changes can be (and are) made pretty regularly. Changes are documented in the CHANGELOG.
Pull requests and issues are welcome.
Matic Perovšek (@mperice), Matej Martinc (@matejMartinc), Roman Orač (@romanorac)
- Knowldge Technologies Department, Jožef Stefan Institute, Ljubljana