Skip to content

Package for constructing paths of embeddings obtained from transformers.

License

Notifications You must be signed in to change notification settings

datasig-ac-uk/nlpsig

Folders and files

NameName
Last commit message
Last commit date
Nov 26, 2024
Jun 21, 2023
Mar 21, 2024
Nov 14, 2023
Apr 11, 2023
Apr 11, 2023
Apr 11, 2023
Apr 13, 2023
Apr 11, 2023
Apr 13, 2023
Apr 11, 2023
Apr 13, 2023
Mar 21, 2024
Oct 9, 2023
Mar 21, 2024

Repository files navigation

nlpsig

Actions Status Codecov Status Documentation Status PyPI version PyPI platforms

NLPSig (nlpsig) is a Python package for constructing streams/paths of embeddings obtained from transformers. The key contributions are:

  • A simple API for taking streams of textual data and constructing streams of embeddings from transformers
  • Simple API for performing dimensionality reduction with nlpsig.DimReduce on the embeddings obtained from transformers by some simple wrappers over popular dimensionality reduction algorithms such as PCA, UMAP, t-SNE, etc.
    • This is particularly useful if we wish to use path signatures in any downstream model since the dimensionality of the embeddings obtained from transformers is usually very high.
    • We present some Signature Network models for longitudinal NLP tasks in the sig-networks library which uses these paths constructed in this library as inputs to neural networks which utilise path signature methodology.
  • We also have simple classes for constructing train/test splits of the data and for K-fold cross-validation in which are general and are applied to examples in the Signature Networks in the sig-networks library.

NLPSig is used by the sig-networks as detailed in our EACL demo paper Sig-Networks Toolkit: Signature Networks for Longitudinal Language Modelling.

Installation

NLPSig is available on PyPI and can be installed with pip:

pip install nlpsig

Contributing

To take advantage of pre-commit, which will automatically format your code and run some basic checks before you commit:

pip install pre-commit  # or brew install pre-commit on macOS
pre-commit install  # will install a pre-commit hook into the git repo

After doing this, each time you commit, some linters will be applied to format the codebase. You can also/alternatively run pre-commit run --all-files to run the checks.

See CONTRIBUTING.md for more information on running the test suite using nox.