-
Escavador
Highlights
- Pro
NLP
Repository for the paper "Named Entity Recognition for Entity Linking: What Works and What's Next" (EMNLP 2021).
Easy Language Model Pretraining leveraging Huggingface's Transformers and Datasets
Powerful unsupervised domain adaptation method for dense retrieval. Requires only unlabeled corpus and yields massive improvement: "GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptatio…
A codebase that makes differentially private training of transformers easy.
Memory Efficient Attention (O(sqrt(n)) for Jax and PyTorch
Unofficially Implements https://arxiv.org/abs/2112.05682 to get Linear Memory Cost on Attention for PyTorch
Repository containing code for "How to Train BERT with an Academic Budget" paper
Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch
A Serverless Text Annotation Tool for Corpus Development
Automatically create Faiss knn indices with the most optimal similarity search parameters.
Contriever: Unsupervised Dense Information Retrieval with Contrastive Learning
skweak: A software toolkit for weak supervision applied to NLP tasks
Represent, send, store and search multimodal data
Qdrant - High-performance, massive-scale Vector Database and Vector Search Engine for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/
Language model fine-tuning on NER with an easy interface and cross-domain evaluation. "T-NER: An All-Round Python Library for Transformer-based Named Entity Recognition, EACL 2021"
Compiler for LightGBM gradient-boosted trees, based on LLVM. Speeds up prediction by ≥10x.
RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose". So it's combining the best of RN…
A machine learning tool for fishing entities
SpanNER: Named EntityRe-/Recognition as Span Prediction
Implementation of the specific Transformer architecture from PaLM - Scaling Language Modeling with Pathways - in Jax (Equinox framework)
A Python library that encapsulates various methods for neuron interpretation and analysis in Deep NLP models.
For optimization algorithm research and development.
Fast and memory-efficient exact attention
Beyond the Imitation Game collaborative benchmark for measuring and extrapolating the capabilities of language models
Extension for Scikit-learn is a seamless way to speed up your Scikit-learn application
Large-scale pretrained models for goal-directed dialog
A Python implementation of the SimString, a simple and efficient algorithm for approximate string matching.
Download and load spaCy models on-the-fly