Skip to content

Dictionary-based text classification. Real fast, real simple.

License

Notifications You must be signed in to change notification settings

samhardyhey/clear-bow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Clear BOW

Overview

A cheap model that takes a formatted dictionary as input, and pushes word frequencies through either a softmax (multi-class) or sigmoid (multi-label) function, to produce label "probabilities". Useful for bootstrapping classifications with raw terminology lists.

Install

Via pip:

pip install clear_bow

Or clone directly:

git clone https://github.com/samhardyhey/clear-bow
cd clear_bow
pip install .

Usage

from clear_bow.classifier import DictionaryClassifier

# define, instantiate, call
super_dict = {
    "regulation": ["asic", "government", "federal", "tax"],
    "contribution": ["contribution", "concession", "personal", "after tax", "10%", "10.5%"],
    "covid": ["covid", "lockdown", "downturn", "effect"],
    "retirement": ["retire", "house", "annuity", "age"],
    "fund": ["unisuper", "aus super", "australian super", "sun super", "qsuper", "rest", "cbus"],
}

# multi-class/label options available
dc = DictionaryClassifier(label_dictionary=super_dict)
dc.predict_single("A 10% contribution is not enough for a well balanced super fund!")

# {'regulation': 0.0878,
#  'contribution': 0.6488,
#  'covid': 0.0878,
#  'retirement': 0.0878,
#  'fund': 0.0878}

See tests for additional usage.

Tests

Simple pytesting via:

pytest

Multi-venv tox testing via:

tox

Dist

  • Update version within setup.py
  • Create dist .whl and .tar archives via:
python setup.py sdist bdist_wheel

Push to main pypi repo via:

twine upload dist/*

About

Dictionary-based text classification. Real fast, real simple.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published