A Python library for working with weighted context-free grammars (WCFGs), weighted finite state automata (WFSAs) and weighted finite state transducers (WFSTs). The library provides efficient implementations for grammar operations, parsing algorithms, and language model functionality.
- Support for weighted context-free grammars with various semirings (Boolean, Float, Real, MaxPlus, MaxTimes, etc.)
- Grammar transformations:
- Local normalization
- Removal of nullary rules and unary cycles
- Grammar binarization
- Length truncation
- Renaming/renumbering of nonterminals
- Earley parsing (O(n³|G|) complexity)
- Standard implementation
- Rescaled version for numerical stability
- CKY parsing
- Incremental CKY with chart caching
- Support for prefix computations
BoolCFGLM
: Boolean-weighted CFG language modelCKYLM
: Probabilistic CFG language model using CKYEarleyLM
: Language model using Earley parsing
- Weighted FSA implementation
- Operations:
- Epsilon removal
- Minimization (Brzozowski's algorithm)
- Determinization
- Composition
- Reversal
- Kleene star/plus
- Semiring abstractions (Boolean, Float, Log, Entropy, etc.)
- Efficient chart and agenda-based algorithms
- Grammar-FST composition
- Visualization support via Graphviz
Clone the repository:
git clone [email protected]:chisym/genlm-grammar.git
cd genlm-grammar
and install with pip:
pip install .
This installs the package without development dependencies. For development, install in editable mode with:
pip install -e ".[test,docs]"
which also installs the dependencies needed for testing (test) and documentation (docs).
- Python >= 3.10
- The core dependencies listed in the
setup.py
file of the repository.
When test dependencies are installed, the test suite can be run via:
pytest tests
Documentation is generated using mkdocs and hosted on GitHub Pages. To build the documentation, run:
mkdocs build
To serve the documentation locally, run:
mkdocs serve