Open LLM Models

Want to know which one is "the best"? Have a look at the 🏆 Leaderboards in the Benchmarking section.
llm.extractum.io The LLM Explorer, a Large Language Model Directory with filters for trending, downloads and latest showing details like quantizations, model types and sizes
can-it-run-llm Check most Huggingface LLMs and quants for hardware requirements like vram, ram and memory requirements

Tools

Native GUIs

chatgptui/desktop
chatbox is a Windows, Mac & Linux native ChatGPT Client
BingGPT Desktop application of new Bing's AI-powered chat
cheetah Speech to text for remote coding interviews, giving you hints from GTP3/4
Chat2DB++ general-purpose SQL & multi DBMS client and reporting tool which uses ChatGPT capabilities to write and optimize Queries
ChatGPT-Next-Web Web, Windows, Linux, Mac GUI. Supports: Local LLMs, Markdown, LaTex, mermaid, code, history compression, prompt templates
ChatGPT Native Application for Windows, Mac, Android, iOS, Linux
koboldcpp llama.cpp with a fancy UI, persistent stories, editing tools, memory etc. Supporting ggmlv3 and old ggml, CLBlast and llama, RWKV, GPT-NeoX, Pythia models
Serge chat interface based on llama.cpp for running Alpaca models. Entirely self-hosted, no API keys needed
faraday.dev using llama.cpp under the hood to run most llama based models, made for character based chat and role play
gpt4all terminal and gui version to run local gpt-j models, compiled binaries for win/osx/linux
gpt4all.zig terminal version of GPT4All
gpt4all-chat Cross platform desktop GUI for GPT4All models (gpt-j)
ollama Run, create, and share llms on macOS, win/linux with a simple cli interface and portable modelfile package
LM Studio closed-source but very easy to use Native Mac, Windows, Linux GUI, supporting ggml, MPT, StarCoder, Falcon, Replit, GPT-Neu-X, gguf
- lms CLI version of LMStudio
pinokio Template based 1 Click installer for ML inference (LLMs, Audio, Text, Video)
Lit-llama training, fine tuning and inference of llama
Dalai LLaMA-based ChatGPT for single GPUs
ChatLLaMA LLaMA-based ChatGPT for single GPUs
mlc-llm, run any LLM on any hardware (iPhones, Android, Win, Linux, Mac, WebGPU, Metal. NVidia, AMD)
webllm Web LLM running LLMs with WebGPU natively in the browser using local GPU acceleration, without any backend, demo
faraday.dev Run open-source LLMs on your Win/Mac. Completely offline. Zero configuration.
ChatALL concurrently sends prompts to multiple LLM-based AI bots both local and APIs and displays the results
pyllama hacked version of LLaMA based on Meta's implementation, optimized for Single GPUs
gmessage visually pleasing chatbot that uses a locally running LLM server and supports multiple themes, chat history search, text to speech, JSON file export, and OpenAI API compatible Python code
selfhostedAI one-click deployment of RWKV, ChatGLM, llama.cpp models for substituting the openAI API to a locally hosted API
Lit-GPT run SOTA LLMs, supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed
text-generation-inference Rust, Python and gRPC server for text generation inference. Used in production at HuggingFace to power LLMs api-inference widgets
minigpt4.cpp to run minigpt4 using 4-bit quantization with using the ggml library in pure C/C++
Windows AI Studio Visual Studio Code extension for Fine-tuning, RAG development and inference of local models
jan an open source alternative to ChatGPT that runs 100% offline on Windows, Intel/Apple Silicon Mac, Linux and Mobile
open-interpreter lets LLMs run code (Python, Javascript, Shell, and more) locally. You can chat with Open Interpreter through a ChatGPT-like interface in your terminal
ClipboardConqueror a novel omnipresent copilot alternative designed to bring your very own LLM AI assistant to any text field
Chat With RTX by NVIDIA using Tensore Cores locally to run LLMs fast with a local RAG workflow.
msty offline-first closed source (but free) GUI with support for llama, mixtral, qwen, llava, gemma and online APIs like openai, gemini, groq, claude etc with advanced features like split chat, in chat editing, prompt templates, sticky prompt
singulatron simple interface to download and run LLMs, similar to LM Studio
torchchat CLI interaction with LLMs such as llama, mistral and more using pytorch execution on linux, android, osx and ios supporting multiple quantization types, rest API, gat and generate
MaxsAistudio Maxime Labonne's Windows native C# based LLM UI for chatting with ollama, OpenAI, Anthropic, Groq and Gemini models with many features including conversation management, templating, embedding retrieval, diagramming etc
Screen Pipe library providing screen, audio and microphone capture stored in an embedding DB and used during query time via a web and desktop frontend as a rewind ai or windows copilot alternative
gollama command line tool to manage ollama models and linking them to LMStudio
gpt_mobile android mobile app to chat with multiple LLMs at once supporting BYOK for OpenAI, Anthropic and Gemini API with local chat history
llm is a CLI utility and Python library that facilitates interaction with LLMs, both remotely and locally, offering functionalities such as running prompts, storing results, generating embeddings and more

Web GUIs

enricoros/nextjs-chatgpt-app
no8081/chatgpt-demo
IPython-gpt use chatGPT directly inside jupyter notebooks
Chatbot UI An open source ChatGPT UI
freegpt-webui provides a user friendly web-interface connecting to free (reverse-engineered) public GPT3.5/GPT4 endpoints using gpt4free
Flux Graph-based LLM power tool for exploring many prompts and completions in parallel.
Text Generation Webui An all purpose UI to run LLMs of all sorts with optimizations (running LLaMA-13b on 6GB VRAM, HN Thread)
Text Generation Webui Ph0rk0z fork supporting all GPTQ versions and max context of 8192 instead of 4096 (because some models support longer context now)
dockerLLM TheBloke's docker variant of text-generation-webui
lollms-webui former GPT4ALL-UI by ParisNeo, user friendly all-in-one interface, with bindings for c_transformers, gptq, gpt-j, llama_cpp, py_llama_cpp, ggml
Alpaca-LoRa-Serve
chat petals web app + HTTP and Websocket endpoints for BLOOM-176B inference with the Petals client
Alpaca-Turbo Web UI to run alpaca model locally on Win/Mac/Linux
FreedomGPT Web app that executes the FreedomGPT LLM locally
HuggingChat open source chat interface for transformer based LLMs by Huggingface
openplayground enables running LLM models on a laptop using a full UI, supporting various APIs and local HuggingFace cached models
RWKV-Runner Easy installation and running of RWKV Models, providing a local OpenAI API, GUI and custom CUDA kernel acceleration. Supports 2gb up to 32gb VRAM
BrainChulo Chat App with vector based Long-Term Memory supporting one-shot, few-shot and Tool capable agents
biniou a self-hosted webui for 30+ generative ai models for text generation, image generation, audio generation, video generation etc.
ExUI simple, lightweight web UI for running local inference using ExLlamaV2
ava Air-gapped Virtual Assistant / Personal Language Server with support for local models using llama.cpp as a backend, demo
llamafile Distribute and run LLMs with a single file on Windows, macOS, Linux
OpenChat web ui that currently supports openAI but will implement local LLM support, RAG with PDF, websites, confluence, office 365
lobe-chat docker image based chat bot framework with plugin and agent support, roles, UI etc
LibreChat OpenAI, Assistants API, Vision, Mistral, Bing, Anthropic, OpenRouter, Google Gemini, model switching, langchain, DALL-E, Plugins, OpenAI Functions, Multi-User, Presets
open-webui formerly ollama webui, docker and kubernetes setup, code, MD, LaTeX formatting, local RAG feature, web browsing, RLHF annotation, prompt presets, model download and switching, multi modal support
ollama-ui Simple HTML UI for Ollama
ollama-ui ChatGPT-Style Responsive Chat Web UI Client (GUI) for Ollama
big-AGI Web Browse, Search, Sharing, Tracking, supporting LocalAI, Ollama, LM Studio, Azure, Gemini, OpenAI, Groq, Mistral, OpenRouter etc.
slickgpt light-weight BYOK web client for the OpenAI API written in Svelte offering a userless share feature, chat history in localStorage, message editing, cost calculation

Backends

ExLlama a more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights. By ReturningTarzan
ExLlamaV2 faster ExLlama
transformers huggingface transformers
bitsandbytes 8 bit inference
AutoGPTQ 4bit inference
llama.cpp C/C++ implementation providing inference for a wide range of LLM architectures like llama, mistral, dbrx, qwen, mamba, gemma and more, supporting a wide range of hardware, with optimizations for ARM, Apple Metal, x86. Offers various quantization techniques, CUDA kernels, Vulkan and SYCL backend support, and CPU+GPU hybrid inference for models larger than the total VRAM capacity
TensorRT-LLM Python API for running LLMs on GPU with support for MHA, MQA, GQA, Tensor Parallelism, INT4/8 Quantization, GPTQ, AWQ, FP8, RoPE to run Baichuan, BLOOM, ChatGLM, Falcon, GPT-J/NeoX, LLaMA/2,MPT, OPT, SantaCoder, StarCoder etc.
tensorrtllm_backend Triton TensorRT-LLM Backend
RWKV.cpp CPU only port of BlinkDL/RWKV-LM to ggerganov/ggml. Supports FP32, FP16 and quantized INT4.
sherpa llama.cpp on android
chatglm.cpp C++ implementation of ChatGLM-6B & ChatGLM2-6B
MLX Apple's ML Toolkit supporting Transformers in the MLX format for faster inference

Voice Assistants

datafilik/GPT-Voice-Assistant
Abdallah-Ragab/VoiceGPT
LlmKira/Openaibot
BarkingGPT Audio2Audio by using Whisper+chatGPT+Bark
gpt_chatbot Windows / elevenlabs TTS + pinecone long term memory
gpt-voice-conversation-chatbot using GPT3.5/4 API, elevenlab voices, google tts, session long term memory
JARVIS-ChatGPT conversational assistant that uses OpenAI Whisper, OpenAI ChatGPT, and IBM Watson to provide quasi-real-time tips and opinions.
ALFRED LangChain Voice Assistant, powered by GPT-3.5-turbo, whisper, Bark, pyttsx3 and more
bullerbot uses GPT and ElevenLabs to join your online meetings, listen for your name and answers questions with your voice
RealChar Create, Customize and Talk to your AI Character/Companion in Realtime using GPT3.5/4, Claude2, Chroma Vector DB, Whisper Speech2Text, ElevenLabs Text2Speech
gdansk-ai full stack AI voice chatbot (speech-to-text, LLM, text-to-speech) with integrations to Auth0, OpenAI, Google Cloud API and Stripe - Web App, API
bark TTS for oobabooga/text-generation-webui make your local LLM talk
bark TTS for oobabooga/text-generation-webui another implementation
iris-llm local voice chat agent
Kobold-Assistant Fully conversational local OpenAI Whisper + Local LLMS + Local Coqui
WhisperFusion ultra low latency conversations built with WhisperLive, WhisperSpeech and Mistral
Linguflex voice assistant with smart home devices control, music control, internet search, email manipulation
GLaDOS project dedicated to building a real-life version of GLaDOS a fictional AI from the game Portal with a quirky personality
AlwaysReddy LLM voice assistant with TTS, STT, reading/writing to clipboard with OpenAI, Anthropic and Local LLM support
LocalAIVoiceChat Local AI talk with a custom voice based on Zephyr 7B model. Uses RealtimeSTT with faster_whisper for transcription and RealtimeTTS with Coqui XTTS for synthesis
june Local voice chatbot powered by Ollama, Hugging Face Transformers, and Coqui TTS
VERBI modular voice assistant application allowing experimenting with SOTA models for transcription, response generation and TTS with a focus on flexibility and extensibility supporting APIs and local models

Retrieval Augmented Generation (RAG)

sqlchat Use OpenAI GPT3/4 to chat with your database
chat-with-github-repo which uses streamlit, gpt3.5-turbo and deep lake to answer questions about a git repo
mpoon/gpt-repository-loader uses Git and GPT-4 to convert a repository into a text format for various tasks, such as code review or documentation generation.
chat-your-data Create a ChatGPT like experience over your custom docs using LangChain
embedchain python based RAG Framework
dataherald a natural language-to-SQL engine built for enterprise-level question answering over structured data. It allows you to set up an API from your database that can answer questions in plain English
databerry create proprietary data stores that can be accessed by GPT
Llama-lab home of llama_agi and auto_llama using LlamaIndex
PrivateGPT a standalone question-answering system using LangChain, GPT4All, LlamaCpp and embeddings models to enable offline querying of documents
Spyglass tests an Alpaca integration for a self-hosted personal search app. Select the llama-rama feature branch. Discussion on reddit
local_llama chatting with your PDFs offline. gpt_chatwithPDF alternative with the ultimate goal of using llama instead of chatGPT
Sidekick Information retrieval for LLMs
DB-GPT SQL generation, private domain Q&A, data processing, unified vector storage/indexing, and support for various plugins and LLMs
localGPT a privateGPT inspired document question-answering solution using GPU instead of CPU acceleration and InstructorEmbeddings, which perform better according to leaderboards instead of LlamaEmbeddings
LocalDocs plugin for GPT4All
annoy_ltm extension to add long term memory to chatbots using a nearest neighbor vector DB for memory retrieval
ChatDocs PrivateGPT + Web UI + GPU Support + ggml, transformers, webui
PAutoBot document question-answering engine developed with LangChain, GPT4All, LlamaCpp, ChromaDB, PrivateGPT, CPU only
AIDE CLI based privateGPT fork, improved, refactored, multiline support, model switch support, non question command support
khoj Chat offline with your second brain, supporting multiple data sources, web search, models etc.
secondbrain Multi-platform desktop app to download and run LLMs locally in your computer
local-rag Ingest files for RAG with open LLMs, without 3rd parties or data leaving your network
Paper QA LLM Chain for answering questions from documents with citations, using OpenAI Embeddings or local llama.cpp, langchain and FAISS Vector DB
BriefGPT document summarization and querying using OpenAI' and locally run LLM's using LlamaCpp or GPT4ALL, and embeddings stored as a FAISS index, built using Langchain.
anything-llm end to end production ready RAG supports multiple vector DBs, remote and local LLMs and supports chat and query mode with Chat Web UI, agents, code execution, web browsing, multi user, citations, multi user, docker
factool factuality Detection in Generative AI
opencopilot LLM agnostic, open source Microsoft Copilot alternative to easily built copilot functionality with RAG, Knowledgebase, Conversional History, Eval, UX into your product
DocsGPT chat with your project documentation using RAG, supports OpenAI and local LLMs, and also provides a RAG-fine-tuned docsgpt-14b model
Swiss Army Llama FastAPI service for semantic text search using precomputed embeddings and advanced similarity measures, with built-in support for various file types through textract
Quivr Dump all your files and thoughts into your private GenerativeAI Second Brain and chat with it
danswer Model agnostic RAG QA with many advanced features like Hybrid search + Reranking, time extraction, user intent identification, User access level management, document update and connectors for many SaaS tools
SecureAI-Tools Chat with local documents through various local or commercial models, supporting user authentication
OpenCopilot implement RAG principles with your own LLM supporting API calling of multiple endpoints
RAGatouille Retrievel with ColBERT and other implementations of SOTA research for your RAG pipeline
QAnything two stage retrieval based on retrieve-and-rerank approach with SOTA performance for EN/CN and planned support for structured and unstructured data and DBs
opengpts open source GPTs and Assistants with LangChain, LangServe and LangSmith. LLM agnostic, Prompt Engineering, Tool support, Vector DB agnostic, Various Retrieval Algorithms, Chat History support
cognee Memory management for RAG and AI Applications and Agents
bionic-gpt LLM deployment with authentication, team and RBAC functionality, RAG pipeline, tenants etc.
rawdog CLI assistant that responds by generating and auto-executing a Python script. Recursive Augmentation With Deterministic Output Generations (RAWDOG) is a novel alternative to RAG
ADeus RAG Chatbot for everything you say, by using an always on audio recorder and a Web App
llm-answer-engine a Perplexity-Inspired Answer Engine Using Next.js, Groq, Mixtral, Langchain, OpenAI, Brave & Serper
R2R open-source framework for building, deploying and optimizing enterprise RAG systems using FastAPI, LiteLLM, DeepEval and flexible components
RAGFlow open-source RAG engine with two step retrieaval and re-ranking and deepdoc vision document parsing, supporting RAPTOR, FlagEmbeddings BCE and BGE rerankers
FreeAskInternet Perplexity inspired, private and local search aggregator using LLMs
dify open-source LLM RAG development with visual graph based workflow editor, observability and model management
morphic slick RAG / Perplexity inspired stack built on next.js and tailwind
jina-reader web app that scrapes/crawsl and parses websites then converts the content to an LLM-friendly input to use in RAG and Tool/Agent workflows
supermemory second brain with memory for your browser bookmarks and tweets
storm Stanford created LLM-powered knowledge curation system that researches a topic and generates a full-length reports with citations from the web
Firecrawl scrapes/crawls and parses websites and turns them into LLM-ready markdown
llm-scraper scrape and turn any webpage into structured data using LLMs
reor LLM assisted note taking with RAG capabilities
cognita LangChain & LlamaIndex Wrapper organizing all RAG components in a modular, API driven and extensible way
Perplexica AI-powered search engine alternative to Perplexity AI
scrapegraph-ai web scraper for python using llm and graph logic to create scraping pipelines
griptape a modular Python framework for building AI-powered applications for enterprise data and APIs. Agents, Pipelines, Workflows, Tools, Memory
adaptive-rag cut LLM costs without sacrificing accuracy by dynamically change the number of docs
AFFiNE knowledge base as a Notion, Miro and Airtable alternative with multimodal AI generation
data-to-paper AI driven research from data with human-verifiability
ragapp Easy Agentic RAG for Enterprise based on LlamaIndex
Argilla human expert rating platform to improve AI output quality based to be used for RLHF and other techniques
Mem0 provides a smart, self-improving memory layer for LLMs, enabling adaptive personalization by retaining user, session, and AI agent memories using a developer-friendly API
FlashRank allows users to add ultra-light and super-fast re-ranking capabilities to existing search and retrieval pipelines using SoTA LLMs and cross-encoders without needing Torch or Transformers, making it highly efficient for CPU usage and cost-effective for serverless deployments RAG with Query Expansion, Colbert v2 & FlashRank
GraphRAG enhance LLM outputs by utilizing knowledge graph memory structures, leveraging Azure resources for structured data extraction from unstructured text.
GraphRAG-Local-UI UI for GraphRAG supporting local LLMs with an interactive Gradio-based UI, offering real-time graph visualization and flexible querying without reliance on cloud models.
vanna generates vector embeddings of your DB schema, documentation and example queries in order to do generate matching Queries based on a user input for RAG
indexify building fast data pipelines for unstructured data (video, audio, images and documents) using extractors for embedding, transformation and feature extraction allowing real time and incremental extraction for RAG workflows
MindSearch AI Search Engine and question answering framework with Perplexity.ai Pro performance using a graph based detailed solution path (multi turn), web search, providing a sleek Web UI
llm-graph-builder turning unstructured data from pdfs, docs, txt, videos, websites into a knowledge graph in neo4j using LLMs to extract entities, nodes, relationships and properties. Built on Langchain.
FlashRAG a python framework for research focused RAG development testedwith various RAG datasets against currently 13 SOTA RAG methodologies and techniques like IRCoT, SuRe, REPLUG, SelfRag, FLARE
Neurite Fractal Graph-of-Thought is an experimental Mind-Mapping framework for Ai-Agents, Web-Links, Notes, and Code including a fractal based web UI where you navigate indexed knowledge in an interactive network
llm-app Dynamic RAG for enterprise. Ready to run with Docker supporting sources from Sharepoint, Google Drive, S3, Kafka, PostgreSQL, real-time data APIs, and more
RAG Techniques Comprehensive collection of advanced RAG techniques like RAPTOR, Reranking etc.
marker PDF to markdown conversion for all languages removing headers footers and other artifacts, supports tables, code and images and works on GPU CPU or MPS using tesseract, heuristics and surya
MinerU high-quality data extraction tool, supports PDF/webpage/e-book extraction cpu and gpu compatible running on windows linux and mac os
openperplex Perplexity inspired AI search using Cohere semantic chunking, Jina Rerankers and serper web search results API
OpenSearch searchGTP / perplexity clone but personalized for you
rag NeuML RAG supporting Vector and Graph retrieval backed by txtai and supporting docker and pip deployment
RAGBuilder automatically optimize hyperparameters of your RAG like chunking strategy details and other configurations and test against a test dataset to identify the best performing parameters for your data
fastRAG build and explore efficient RAG methods and techniques with a focus on research using haystack compatible components
kotaemon is an open-source tool providing a user-friendly UI for document-based QA using a hybrid retrieval augmented generation (RAG) pipeline, supporting both local and API-based LLMs, making it ideal for creating custom RAG-based solutions.
Unstract for automated extraction and transformation from unstructured documents like PDFs into structured formats, leveraging LLMs in conjunction with retrieval-augmented generation to enhance data processing and retrieval tasks. Alternative to naive OCR or Azure Document Intelligence Cracking with Layout awareness
rerankers lightweight, low-dependency, unified python library to use all common reranking and cross-encoder models like ColBERT, BGE, Gemma, MiniCPM and all SentenceTransformers, RankGPTs, T5 based, FlashRank, Cohere, Jina, Voyage and MixedBread APIs and RankLLM support
rank_llm python library supporting reranking with pointwise and listwise rerankers like monoT5 and RankGPT variants such as RankZephyr, RankGPT4-o
crawl4ai tool for RAG solutions for simultaneous multi-URL crawling, media tags, links and metadata extraction strategies, while ensuring privacy with proxy support and session management for complex multi-page scenarios and provides LLM-friendly output formats

Browser Extensions

sider chrome side-bar for chatGPT and OpenAI API supporting custom prompts and text highlighting
chathub-dev/chathub
Glarity open-source chrome extension to write summaries for various websites including custom ones and YouTube videos. Extensible
superpower-chatgpt chrome extension / firefox addon to add missing features like Folders, Search, and Community Prompts to ChatGPT
Lumos Chrome Extension with OLlama Backend as a RAG LLM co-pilot for browsing the web
chatGPTBox add useful LLM chat-boxes to github and other websites, supporting self-hosted model (RWKV, llama.cpp, ChatGLM)

Agents / Automatic GPT

Auto GPT
AgentGPT Deploy autonomous AI agents, using vectorDB memory, web browsing via LangChain, website interaction and more including a GUI
microGPT Autonomous GPT-3.5/4 agent, can analyze stocks, create art, order pizza, and perform network security tests
Auto GPT Plugins
AutoGPT-Next-Web An AgentGPT fork as a Web GUI
AutoGPT Web
AutoGPT.js
LoopGPT a re-implementation of AutoGPT as a proper python package, modular and extensible
Camel-AutoGPT Communicaton between Agents like BabyAGI and AutoGPT
BabyAGIChatGPT is a fork of BabyAGI to work with OpenAI's GPT, pinecone and google search
GPT Assistant An autonomous agent that can access and control a chrome browser via Puppeteer
gptchat a client which uses GPT-4, adding long term memory, can write its own plugins and can fulfill tasks
Chrome-GPT AutoGPT agent employing Langchain and Selenium to interact with a Chrome browser session, enabling Google search, webpage description, element interaction, and form input
autolang Another take on BabyAGI, focused on workflows that complete. Powered by langchain.
ai-legion A framework for autonomous agents who can work together to accomplish tasks.
generativeAgent_LLM Generative Agents with Guidance, Langchain, and local LLMs, implementation of the "Generative Agents: Interactive Simulacra of Human Behavior" paper, blogpost
gpt-engineer generates a customizable codebase based on prompts using GPT4, and is easy to adapt and extend; runs on any hardware that can run Python.
gpt-migrate takes your existing code base and migrates to another framework or language
MetaGPT multi agent meta programming framework. takes requirements as input and outputs user stories, analysis, data structures, etc. MetaGPT includes product managers, architects, PMs, engineers and uses SOPs to run, paper
aider command-line chat tool that allows you to write and edit code with OpenAI's GPT models
AutoChain Build lightweight, extensible, and testable LLM Agents
chatdev Develop Custom Software using Natural Language, while an LLM-powered Multi-Agent Team develops the software for you, paper
AutoAgents Generate different roles for GPTs to form a collaborative entity for complex tasks, paper
RestGPT LLM-based autonomous agent controlling real-world applications via RESTful APIs
MemGPT intelligently manages different memory tiers in LLMs to provide extended context, supporting vector DBs, SQL, Documents etc
XAgent Autonomous LLM Agent for Complex Task Solving
HAAS Hierarchical Autonomous Agent Swarm create a self-organizing and ethically governed ecosystem of AI agents, inspired by ACE Framework
agency-swarm agent orchestration framework enabling the creation of a collaborative swarm of agents (Agencies), each with distinct roles and capabilities
Auto Vicuna Butler Baby-AGI fork / AutoGPT alternative to run with local LLMs
BabyAGI AI-Powered Task Management for OpenAI + Pinecone or Llama.cpp
Agent-LLM Webapp to control an agent-based Auto-GPT alternative, supporting GPT4, Kobold, llama.cpp, FastChat, Bard, Oobabooga textgen
auto-llama-cpp fork of Auto-GPT with added support for locally running llama models through llama.cpp
AgentOoba autonomous AI agent extension for Oobabooga's web ui
RecurrentGPT Interactive Generation of (Arbitrarily) Long Text. Uses LSTM, prompt-engineered recurrence, maintains short and long-term memories, and updates these using semantic search and paragraph generation.
SuperAGI open-source framework that enables developers to build, manage, and run autonomous agents. Supports tools extensions, concurrent agents, GUI, console, vector DBs, multi modal, telemetry and long term memory
GPT-Pilot writes scalable apps from scratch while the developer oversees the implementation
DevOpsGPT Multi agent system for AI-driven software development. Combine LLM with DevOps tools to convert natural language requirements into working software
ToRA Tool-integrated Reasoning Agents designed to solve challenging mathematical reasoning problems by interacting with tools, e.g., computation libraries and symbolic solvers, paper
ACE Autonomous Cognitive Entities Framework to automatically create autonomous agents and sub agents depending on the tasks at hand
SuperAgent Build, deploy, and manage LLM-powered agents
aiwaves-cn/agents Open-source Framework for Autonomous Language Agents with LSTM, Tool Usage, Web Navigation, Multi Agent Communication and Human-Agent interaction, paper
autogen framework that enables the development of LLM applications using multiple agents that can converse with each other to solve tasks, paper
openagents an Open Platform for Language Agents in the Wild, paper
TaskWeaver code-first agent framework for planning and executing data analytics tasks interpreting user requests and coordinating plugins
crewAI framework for orchestrating role-playing, autonomous AI agents
phidata toolkit for building AI Assistants using function calling enabling RAG and other workflows
FRIDAY Framework for Computer Agents with Self-Improvement on OSX and Linux
agentkit Starter-kit to build constrained agents with Nextjs, FastAPI and Langchain
LaVague control a web browser through natural language instructions using visual language models, a Large Action Model framework for AI Web Agents
Skyvern control a web browser through natural language instructions using visual language models
AIOS SDK that embeds large language model into Operating Systems providing Agent workflows, OS Kernel integration and System Calls via LLM Kernel (Agent, Context, Memory, Storage, Tools, Access)
WebLlama code and Llama-3-8B-web model to build agents that browse the web by following chat style instructions
memary Longterm Memory for Autonomous Agents with Routing Agent (ReAct) Knowledge Graph creation and retrieval with Neo4j, Memory Stream and Entity Knowledge Store
maestro Subagent orchestration that breaks down tasks into subtasks and orchestrates its execution and allows for refinement supporting Claude, Groq, GPT-4o and local Ollama/LMStudio
AutoGroq Create AutoGen compatible teams with assistants and workflows from a simple text prompt
FinRobot AI Agent Platform for Financial Applications using LLMs
llama-fs organizes a folder and renames files on your system by looking at each file and creating a useful structure based on metadata and common conventions
Leon personal assistant with multiple features like TTS, ASR, LLM usage
Vision Agent allows users to describe vision problems in text and utilizes agent frameworks to generate code solutions, leveraging existing vision models to expedite task completion.
Lagent allows users to efficiently build LLM-based agents with a unified interfacing design, supporting various models and customizable actions using frameworks like OpenAI API, Transformers, and LMDeploy
IoA a framework where diverse AI agents, using an internet-inspired architecture, can autonomously form teams and execute tasks asynchronously, leveraging heterogeneous agent integration and adaptive conversation flows
Atomic Agents a framework designed to be modular, extensible, and easy to use to build agents on top of Instructor and Pydantic
PraisonAI framework providing UI, Chat, Code and Train modules to orchestrate Multi Agent solutions like CrewAI or AutoGen helping with automated agent creation, interchangeable LLM APIs, YAML based configuration and Tool Use integration
bolna end-to-end framework for LLM based voice assistants handling phone calls with TTS, Speech Recognition, text generation supporting local and cloud APIs for LLM and voice generation
gpt-computer-assistant native operating system assistant using vision, voice and text for windows mac and linux with support for custom tool use
LAMBDA large Model Based Data Agent is a multi agent data analysis system using LLMs to perform complex data analysis tasks through human instructions automatically planning and writing code and providing a UI to generate reports automatically
PR-Agent is an open-source tool that uses AI to automatically analyze, review, and provide suggestions for improving pull requests, enhancing code quality and development efficiency across multiple git platforms by automating common PR management tasks.
ScholArxiv is a GUI based assistant to search, read, bookmark, share, and download academic papers from the arXiv repository, featuring LLM-driven capabilities to provide summaries and in-depth exploration of research content
Composio provides AI agents with a library of over 100 function calling tools, enabling seamless interaction across multiple platforms and tools to enhance AI agent functionality, tool calling capability and automation
Agent-Zero is an open ended agent framework without predefined tasks for open ended usage that dynamically grows from usage, with multi-agent cooperation, operating system and coding capabilities, with added features for real-time intervention and logging of activities
screenpipe open source Microsoft Recall and rewind.ai and second brain alternative with continuous screen and audio recording on your device to do RAG and question answering on everythin you do digitally

Multi Modal

huggingGPT / JARVIS Connects LLMs with huggingface specialized models
Langchain-huggingGPT reimplementation of HuggingGPT using langchain
OpenAGI AGI research platform, solves multi step tasks with RLTF and supports complex model chains
ViperGPT implementation for visual inference and reasoning with openAPI
TaskMatrix former visual-chatgpt connects ChatGPT and a series of Visual Foundation Models to enable sending and receiving images during chatting.
PandaGPT combines ImageBind and Vicuna to understand and combine multimodal inputs from text, image, audio, depth, thermal, and IMU.
AGiXT agents with memory, model agnostic, docker deployment, plugin extendable, chat feature, speech to text and text to speech, REST api and more
SelfTalker Talk with your virtual self using voice cloning, LLMs and computer vision models
CoDi Any to any generation via composable diffusion
AutoMix Mixing Language Models with Self-Verification and Meta-Verification, paper
NExT-GPT Any-to-Any Multimodal LLM for arbitary input-output combinations (any-to-any) for text, image, video, audio and beyond, paper, weights
SpeechGPT Empowering LLMs with Intrinsic Cross-Modal Conversational Abilities for speech audio input and output
OpenFLamingo-v2 MPT and RedPajama fine tuned on the OpenFLamingo data set for training Autoregressive Vision-Language Models, models
Obsidian 3B open source multimodal visual LLM
ml-ferret Refer and Ground Anything Anywhere at Any Granularity
CogVLM SOTA open visual language model and Agent
Video-LLaVA Image and Video dense LLM and MoE-LLaVA 3B sparse Mixture of Expert model outperforming the original dense 7B model
MobileAgent Autonomous Multi-Modal Mobile Device Agent with Visual Perception that can execute tasks
MiniCPM-V MiniCPM-V and OmniLMM multimodal vision & language models with OCR and text-vision reasoning capabilities
AppAgent Multimodal Agents as Smartphone Users
InternVL InternVL-Chat Model and surrounding technology to rebuild a Visual Language Model
PyWinAssistant Large Action Model for Windows 10/11 win32api controlling User Interfaces via Visualization-of-Thought (VoT)
Phi-3-vision-128k-instruct a lightweight, state-of-the-art open multimodal model built upon datasets which include - synthetic data and filtered publicly available websites
llama3v vision model that is powered by Llama3 8B and siglip-so400m matching gpt4-v on some benchmarks
Zerox OCR allows users to convert PDFs into Markdown using a vision-based OCR process with GPT-4o-mini, optimizing for complex layouts like tables and charts
Facebook Chameleon multimodal text and image model for any combination of modality in one transformer, releasing 7b and 30b models with a research license
InternVL-2.0 Multimodal Large Language Model ranging from 1B to a 108B based on llama3 with support for text and image modality in any combination and output formats such as text, bounding boxes and masks
LLaVA-NeXT several models for video, image and text multi modality based on different base models like llama, qwen etc
LLaMA-Omni is a speech-language model built on Llama-3.1-8B-Instruct and trained using just 4 GPUs, offering low-latency, high-quality speech interactions and simultaneous generation of text and speech responses
moshi a speech-text foundation model that supports low-latency high-quality speech interactions and simultaneous generation of text responses, using Mimi, a SOTA streaming neural audio codec
Mini-Omni a multimodal LLM based on Qwen2 offering real-time end-to-end speech input and streaming audio output conversational capabilities

Code generation

FauxPilot open source Copilot alternative using Triton Inference Server
Turbopilot open source LLM code completion engine and Copilot alternative
Tabby Self hosted Github Copilot alternative with RAG-based code completion which utilizes repo-level context
starcoder.cpp
GPTQ-for-SantaCoder 4bit quantization for SantaCoder
supercharger Write Software + unit tests for you, based on Baize-30B 8bit, using model parallelism
Autodoc toolkit that auto-generates codebase documentation using GPT-4 or Alpaca, and can be installed in a git repository in about 5 minutes.
smol-ai developer a personal junior developer that scaffolds an entire codebase with a human-centric and coherent whole program synthesis approach using <200 lines of Python and Prompts.
locai kobold/oobabooga -compatible api for vscode
oasis local LLaMA models in VSCode
aider cli tool for writing and modifying code with GPT-3.5 and GPT-4
continue open-source copilot alternative for software development as a VS Code plugin, can use gpt-4 API or local codellama and other models
chatgpt-vscode vscode extension to use unofficial chatGPT API for a code context based chat side bar within the editor
codeshell-vscode vscode extension to use the CodeShell-7b models
localpilot vscode copilot alternative using local llama.cpp/ggml models on Mac
sweep AI-powered Junior Developer for small features and bug fixes.
devika Open Source Devin clone. Software Engineer that takes high level human instructions, breaks them down, plans ahead and creates a software product out of it
OpenHands formerly OpenDevin is a Devin clone imitating an autonomous AI software engineer who is capable of executing complex engineering tasks and collaborating actively with users on software development projects
OpenCodeInterpreter Interface, framework and models for code generation, execution and improvement
gptscript natural language scripting language to achieve tasks by writing and executing code using an LLM
tlm Local CLI Copilot, powered by CodeLLaMa
llm-cmd Use LLM to generate and execute commands in your terminal/shell/cli
gorilla-cli use natural language in the terminal to assist with command writing, gorilla writes the commands based on a user prompt, while the user just approves them
SWE-agent system for autonomously solving issues in GitHub repos. Gets similar accuracy to Devin on SWE-bench, takes 93 seconds on avg
openui v0.dev alternative for text guided UI creation for HTML/React,Svelte, Web Components, etc.
codel autonomus agent performing tasks and projects using terminal, browser and editor
AutoCodeRover automated GitHub issue resolver for bug fixes and feature addition
plandex terminal-based AI coding agent for complex tasks with planning and execution capabilities
AutoCoder Agentic code generation, execution and verification allowing external packages and using a fine tuned deepseeker-coder model AutoCoder-33B and 6.7B model
Amplication allows users to automate backend application development for .NET and Node.js using an AI-powered platform, ensuring scalable and secure code with a user-friendly, plugin-based architecture.
twinny VS Code Extension for Github Copilot like code completion and chat assistance, leveraging customizable API endpoints and supporting multiple backends like Ollama and llama.cpp
Mutahunter generate unit tests automatically and perform LLM-based mutation testing, enhancing fault detection with context-aware mutations across various programming languages
cover-agent automated test creation for maximum test coverage
llamacoder Claude Artifacts inspired app generator based on llama3 405B on together.ai, sandpack code sandbox, next.js app router with tailwind and helicone observability
MLE-Agent is a coding agent tailored for ML and AI engineers and researchers, which uses arXiv and Papers with Code as a RAG source to automate coding tasks, debugging support via extensive tool integration and a command-line interface
RepoGraph is a plug-in module that enhances the software engineering capabilities of LLMs by providing context at the repository-level, using a graph-based approach for RAG or agents on a codebase, x.com thread with similar projects
o1-engineer is a command-line tool that uses OpenAI's API to automate developer tasks such as code generation, file editing, project planning and code review, enhancing project management efficiency and workflow.

Libraries and Wrappers

acheong08/ChatGPT Python reverse engineerded chatGPT API
gpt4free Use reverse engineered GPT3.5/4 APIs of other website's APIs
GPTCache, serve cached results based on embeddings in a vector DB, before querying the OpenAI API.
kitt TTS + GPT4 + STT to create a conference call audio bot
Marvin simplifies AI integration in software development with easy creation of AI functions and bots managed through a conversational interface
chatgpt.js client-side JavaScript library for ChatGPT
ChatGPT-Bridge use chatGPT plus' GPT-4 as a local API
Powerpointer connects to openAPI GPT3.5 and creates a powerpoint out of your content
EdgeGPT Reverse engineered API of Microsoft's Bing Chat using Edge browser
simpleaichat python package for simple and easy interfacing with chat AI APIs
Dotnet SDK for openai chatGPT, Whisper, GPT-4 and Dall-E SDK for .NET
node-llama-cpp TS library to locally run many models supported by llama.cpp, enhanced with many convenient features, like forcing a JSON schema on the model output on the generation level
FastLLaMA Python wrapper for llama.cpp
WebGPT Inference in pure javascript
TokenHawk performs hand-written LLaMA inference using WebGPU, utilizing th.cpp, th-llama.cpp, and th-llama-loader.cpp, with minimal dependencies
WasmGPT ChatGPT-like chatbot in browser using ggml and emscripten
AutoGPTQ easy-to-use model GPTQ quantization package with user-friendly CLI
gpt-llama.cpp Replace OpenAi's GPT APIs with llama.cpp's supported models locally
llama-node JS client library for llama (or llama based) LLMs built on top of llama-rs and llama.cpp.
TALIS serves a LLaMA-65b API, optimized for speed utilizing dual RTX 3090/4090 GPUs on Linux
Powerpointer-For-Local-LLMs connects to oobabooga's API and creates a powerpoint out of your content
OpenChatKit open-source project that provides a base to create both specialized and general purpose chatbots and extensible retrieval system, using GPT-NeoXT-Chat-Base-20B as a base model
webgpu-torch Tensor computation with WebGPU acceleration
llama-api-server that uses llama.cpp and emulates an openAI API
CTransformers python bindings for transformer models in C/C++ using GGML library, supporting GPT-2/J/NeoX, StableLM, LLaMA, MPT, Dollyv2, StarCoder
basaran GUI and API as a drop-in replacement of the OpenAI text completion API. Broad HF eco system support (not only llama)
CodeTF one-stop Python transformer-based library for code LLMs and code intelligence, training and inferencing on code summarization, translation, code generation
CTranslate2 provides fast Transformer (llama, falcon and more) inference for CPU and GPU, featuring compression, parallel execution, framework support
auto-gptq easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ for GPU inference
exllama Memory-Efficient Llama Rewrite in Python/C++/CUDA for 4bit quantized GPTQ weights, running on GPU, faster than llama.cpp (2023-06-13), autoGPTQ and GPTQ-for-llama
SimpleAI Self-Hosted Alternative to openAI API
rustformer llm Rust-based ecosystem for llms like BLOOM, GPT-2/J/NeoX, LLaMA and MPT offering a CLI for easy interaction and powered by ggml
Haven Fine-Tune and Deploy LLMs On Your Own Infrastructure
llama-cpp-python Python Bindings for llama.cpp with low level C API interface, python API, openai like API and LangChain compatibility
candle a minimalist ML framework for Rust with a focus on performance (including GPU support) and ease of use
LangChain Framework for LLM Application Development (example, paolorechia/learn-langchain with vicuna and GPTQ 4 bit support)
Langstream a lighter alternative to LangChain
LangFlow GUI for Langchain using graphs/flows
Toolformer implementation Allows LLMs to use Tools
megabots to create LLM bots by providing Q&A, document retrieval, vector DBs, FastAPI, Gradio UI, GPTCache, guardrails, whisper, supports OpenAI API (local LLMs planned)
gorilla Enables LLMs to use tools by semantically and syntactically correctly invoking APIs. Reduces hallucination, custom trained model weights based on llama-7b
agency A fast and minimal actor model framework allows humans, AIs, and other computing systems to communicate with each other through shared environments called "spaces".
Vercel AI SDK a library for building edge-ready AI-powered streaming text and chat UIs in React, Svelte and Vue supporting LangChain, OpenAI, Anthropic and HF
tinygrad Geohot's implementation for a PyTorch killer with the target to be 2x faster
Xorbits Inference (Xinference) versatile library designed to deploy and serve language, speech recognition, and multimodal models
data-juicer zero code, low code and off the shelf data processing for LLMs
Microsoft semantic-kernel a lightweight SDK enabling integration of AI Large Language Models (LLMs) with conventional programming languages
LlamaIndex provides a central interface to connect your LLM's with external data
haystack LLM orchestration framework to connect models, vector DBs, file converters to pipelines or agents that can interact with your data to build RAG, Q&A, semantic search or conversational agent chatbots
rivet Visual graph/flow/node based IDE for creating AI agents and prompt chaining for your applications
promptflow visual graph/flow/node based IDE for creating AI agents
Flowise Drag & drop UI with visual graph/flow/nodes to build your customized LLM app
ChainForge visual graph/flow/node based prompt engineering UI for analyzing and evaluating LLM responses
LangStream Event-Driven Developer Platform for Building and Running LLM AI Apps, also providing a visual graph/flow/node based UI. Powered by Kubernetes and Kafka
activepieces Automation with SaaS tools and GPT using a visual graph/flow/node based workflow
kernel-memory Index and query any data using LLM and natural language, tracking sources and showing citations, ideal for RAG pipelines
LocalAI Drop-in OpenAI API replacement with local LLMs, Audio To Text (whisper), Image generation (Stable Diffusion), OpenAI functions and Embeddings with single exe deployment
dify LLM app development platform combines AI workflow, RAG pipeline, agent capabilities, model management, observability in a visual graph/flow/node editor
CopilotKit build fully custom AI Copilots with support for chat, textbox assist, agents and context built on LangChain
Bisheng LLM Application Develoment environment mainlzyin Chinese with some English documents
Typebot allows users to visually create advanced chatbots that can be embedded in web/mobile apps, featuring customizable themes, real-time analytics, and various integration options with services like OpenAI, Google Sheets, and Zapier.
DOM to Semantic Markdown convert HTML DOM to semantic Markdown, preserving the semantic structure and metadata for efficient LLM processing, using HTML-to-Markdown AST conversion and customizable options
embed python embedding, rerank and clip models inference library for stable, fast and easy to use local embedding serving with a focus on sync to async API
fastembed fast, Accurate, Lightweight Python library to serve State of the Art Embeddings locally supporting GPU and CPUs, dense and sparse models, colbert, clip and more
LangGraph Studio visual graph/flow/node based LLM app development IDE from LangChain using LangGraph and LangSmith

Prompt templating / Grammar

Jsonformer Generate Structured JSON from Language Models by handling JSON synthax, and letting LLM just output the values
Microsoft guidance templating / grammar for LLMs, Demo project by paolorechia for local text-generation-webui. reddit thread. guidance fork and llama-cpp-python fork how-to on reddit
outlines Guidance alternative templating / grammar for LLM generation to follow JSON Schemas, RegEx, Caching supporting multiple models, model APIs, and HF transformers
lmql LMQL templating / grammar language for LLMs based on a superset of Python going beyond constrain-based templating
TypeChat templating / grammar for LLMs to enforce constraints for text generation
GBNF templating / grammar implementation using Bakus-Naur Form (BNF) in llama.cpp to guide output, BNF Playground
sglang structured generation language designed for LLMs with multiple chained generation calls, advanced prompting techniques, control flow, multiple modalities, parallelism, and external interaction
DSPy a framework for algorithmically optimizing LM prompts and weights
AlphaCodium Automatic Code Generation improvements with Prompt Engineering and Flow Engineering
aici lets you build Controllers that constrain and direct output of aLLM in real time
instructor structured outputs for LLMs. Pydantic, simple and transparent
Every Way To Get Structured Output From LLMs explores various methods for obtaining structured output from LLMs, including techniques beyond simple JSON response formatting and regex stacking.
AgentInstruct Instructs Agents to be better at Zero Shot reasoning tasks
TextGrad optimize coding solutions and problem-solving tasks by implementing automatic differentiation via text feedback from LLMs
formatron control the format of language models' output with minimal overhead supporting exllama2, vllm, rwkv using a mix of regex and context-free grammars
optillm is an OpenAI API compatible optimizing inference proxy that implements state-of-the-art techniques to improve the accuracy and performance of LLMs, especially for reasoning over coding, logical and mathematical queries, using methods such as CoT with Reflection, Plan Search, and more
ell is a lightweight programming library for developers and researchers using language models, treating prompts as functions and providing tools for prompt engineering optimization, multimodal input and output processing, and capturing various uses of language model invocations, to systematize prompt engineering and seamlessly fit into existing workflows
g1 early prototype to replicate OpenAI o1 step by step reasoning and reflection (system 2 thinking) capabilities without using a fine tuned model

Fine Tuning & Training

simple llama finetuner
LLaMA-LoRA Tuner
alpaca-lora
StackLLaMA Fine-Tuning Guide by huggingface
xTuring LLM finetuning pipeline supporting LoRa & 4bit
Microsoft DeepSpeed Chat
How to train your LLMs
H2O LLM Studio Framework and no-code GUI for fine tuning SOTA LLMs
Implementation of LLaMA-Adapter, to fine tune instructions within hours
Hivemind Training at home
Axolotl a llama, pythia, cerebras training environment optimized for Runpod supporting qlora, 4bit, flash attention, xformers
LMFlow toolbox for finetuning, designed to be user-friendly, speedy, and reliable
qlora uses bitsandbytes quantization and PEFT and transformers for efficient finetuning of quantized LLMs
GPTQlora Efficient Finetuning of Quantized LLMs with GPTQ QLoRA and AutoGPTQ for quantization
Landmark Attention QLoRA for landmark attention with 50x context compression and efficient token selection
ChatGLM Efficient Finetuning fine tuning ChatGLM models with PEFT
AutoTrain Advanced by Huggingface, faster and easier training and deployments of state-of-the-art machine learning models
Pearl Production-ready Reinforcement Learning AI Agent Library brought by the Applied Reinforcement Learning team at Meta
LLaMA-Factory Easy-to-use LLM fine-tuning framework (LLaMA, BLOOM, Mistral, Baichuan, Qwen, ChatGLM)
LLaMa2lang convenience scripts to finetune any foundation model for chat towards any language
fsdp_qlora Answer.AI's training script enabling 70B training on 48GB vram utilizing QLoRA + FSDP, also available in Axolotl
unsloth 2-5x faster and 60% less memory local QLoRA finetuning supporting Llama, CodeLlama, Mistral, TinyLlama etc. using Triton
transformerlab Download, interact, and finetune models locally in a convenient GUI
llm.c train GPT and other LLM architectures with a native C based CUDA accelerated libary
LLaMA-Factory Easy and efficient fine-tuning supporting various model architectures like Llama, Mixtral, Phi etc. for pre-training, supervised fine tuning, PPO, DPO, quantized fine tuning etc
torchtune native pytorch LLM fine tuning fur llama architectures with QLoRA support
xtuner fine tuning supporting llm, vlm pre training and fine tuning. deepspeed, ZeRO optimization, various architectures, QLoRA and LoRA support
Mergoo merge multiple LLM experts and fine-tune them. Support for MoE, MoA for Llama1-3, Mistral, Phi3 and BERT models
augmentoolkit help automatically creating structured instruction or classifier data sets from unstructured text
abliteration altering the refusal direction between harmless and harmful prompts to change an existing model alignment without fine-tuning, based in parts on blogpost refusal in llms is mediated by a single direction and FailSpy's abliterator script

Merging & Quantization

mergekit Tools for merging pretrained large language models.
MergeLM LLMs are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch
SLERP Spherical Linear Interpolation Model Merging
AutoAWQ
AQLM Extreme Compression of LLMs to 2bit via Additive Quantization to work with models of LLaMA, Mistral and Mixtral families paper
EfficientQAT efficient Quantization-Aware Training and support for model transfer through gptqmodel to support GPTQ v2 and possibly GGUF llama.cpp and EXL2 in the future
GPTQModel fork of AutoGPTQ for an easy to use LLM quantization and inference toolkit based on GPTQ algorithm for weight-only quantization with more model support, faster speed, better quants supporting gptq, Intel optimized quants, vLLM and SGLang optimization and more
AutoGGUF automated GGUF model quantization with imatrix and LoRA support

Resources

Data sets

Alpaca-lora instruction finetuned using Low Rank Adaption
codealpaca Instruction training data set for code generation
LAION AI / Open-Assistant Dataset (https://github.com/LAION-AI/Open-Assistant / https://projects.laion.ai/Open-Assistant/ / https://open-assistant.io)
ShareGPT pre-cleaned, English only "unfiltered," and 2048 token split version of the ShareGPT dataset ready for finetuning
Vicuna ShareGPT pre-cleaned 90k conversation dataset
Vicuna ShareGPT unfiltered
GPTeacher
alpaca-cleaned
codealpaca 20k
gpt3all pruned
gpt4all_prompt_generations_with_p3
gpt4all_prompt_generations
alpaca-plus-gpt4all-without-p3
Alpaca dataset from Stanford, cleaned and curated
Alpaca Chain of Thought fine tuning dataset for EN and CN
PRESTO paper Multilingual dataset for parsing realistic task-oriented dialogues by Google & University of Rochester, California, Santa Barbara, Columbia
RedPajama Dataset and model similar to LLaMA but truly open source and ready for commercial use. hf
BigCode The Stack
open-instruct-v1
awesome-instruction-dataset list of instruction datasets by yadongC
The Embedding Archives Millions of Wikipedia Article Embeddings in multiple languages
Rereplit-finetuned-v1-3b & replit-code-v1-3b outperforming all coding OSS models, gets released soon
alpaca_evol_instruct_70k an instruction-following dataset created using Evol-Instruct, used to fine-tune WizardLM
gpt4tools_71k.json from GPT4Tools paper, having 71k instruction-following examples for sound/visual/text instructions
WizardVicuna 70k dataset used to fine tune WizardVicuna
Numbers every LLM Developer should know
airoboros uncensored
CoT collection, paper
airoboros-gpt4 fine-tuning dataset optimized for trivia, math, coding, closed context question answering, multiple choice, writing
fin-llama a LLaMA finetuned for finance, code, model
dataset
SlimPajama-627B Deduplicated and cleaned RedPajama based dataset for higher information density and quality at lower token length
dolphin an attempt to replicate Microsoft Orca using FLANv2 augmented with GPT-4 and 3.5 completions
OpenOrca collection of augmented FLAN data with distributions aligned with the orca paper
ExpertQA Expert-Curated Questions and Attributed Answers dataset with 2177 questions spanning 32 fields, along with verified answers and attributions for claims in the answers, paper
annas-archive world’s largest open-source open-data library. ⭐️ Mirrors Sci-Hub, Library Genesis, Z-Library, and more. 📈 22,052,322 books, 97,847,390 papers, 2,451,032 comics, 673,013 magazines
RedPajama-Data-v2 Open Dataset with 30 Trillion Tokens for Training, HF
MINT-1T Multimodal training Dataset with one trillion tokens including HTML, PDF from CommonCrawl 2023 and ArXiv data
Open-Reasoning-Tasks NousResearch's reasoning task repository to teach elicit or show reasoning samples to LLMs
Everything_Instruct a huge dataset designed for instruction-based fine-tuning of language models, useful for improving model performance in task-specific applications by providing diverse instruction-following data.
Everything_Instruct_Multilingual a huge multilingual dataset for instruction-based fine-tuning of language models, aimed at enhancing their performance across various languages by providing diverse, multilingual instruction-following examples.
MMMLU Provides a multilingual benchmark for AI models' general knowledge understanding, utilizing professional human translations into 14 languages, prioritizing inclusivity and effectiveness, especially for underrepresented languages

Research

LLM Model Cards
GPTs are GPTs: An early look at the labor market impact potential of LLMs
ViperGPT Visual Inference via Python Execution for reasoning
Emergent Abilities of LLMs , blog post
facts checker reinforcement
LLaVA: Large Language and Vision Assistant, combining LLaMA with a visual model. Delta-weights released
Mass Editing Memory in a Transformer
MiniGPT-4: Enhancing Vision-language Understanding with Advanced Large Language Models
WizardLM | Fine tuned LLaMA 7B with evolving instructions, outperforming chatGPT and Vicuna 13B on complex test instructions (code, delta weights)
Scaling Transformer to 1M tokens and beyond with RMT
AudioGPT | Understanding and Generating Speech, Music, Sound, and Talking Head (github, hf space)
Chameleon-llm, a paper about Plug-and-Play Compositional Reasoning with GPT-4
GPT-4-LLM share data generated by GPT-4 for building an instruction-following LLMs with supervised learning and reinforcement learning. paper
GPT4Tools Teaching LLM to Use Tools via Self-instruct. code
CAMEL: Communicative Agents for "Mind" Exploration of Large Scale Language Model Society. preprint paper, website
Poisoning Language Models During Instruction Tuning
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention
Dromedary: Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision, code, weights
Unlimiformer: transformer-based model that can process unlimited length input by offloading attention computation to a k-nearest-neighbor index, extending the capabilities of existing models like BART and Longformer without additional weights or code modifications. code
Salesforce LAVIS provides a comprehensive Python library for language-vision intelligence research, including state-of-the-art models like BLIP-2 for vision-language pretraining and Img2LLM-VQA for visual question answering, alongside a unified interface
FLARE an active retrieval augmented generation technique that iteratively predicts, retrieves, and refines content, improving the accuracy and efficiency of long-form text generation in language models
Hyena a subquadratic-time layer that has the potential to significantly increase context length in sequence models, using a combination of long convolutions and gating. Long Convs and Hyena implementations
FastServe an efficient distributed inference serving system for LLMs that minimizes job completion time using preemptive scheduling and efficient GPU memory management, built on NVIDIA FasterTransformer.
FrugalGPT is a model that uses LLM cascade to optimize the performance and cost-efficiency of LLMs like GPT-4.
Landmark Attention LLaMa 7B with 32k tokens. Code, llama7b diff weights, merged llama7b weights
QLORA Efficient Finetuning of Quantized LLMs
Tree of Thoughts (ToT) Enables exploration over text, improves strategic decision-making in language models. Code. Example implementation, discussion
MEGABYTE Efficient multiscale decoder architecture for long-sequence modeling.
PandaGPT: project page, code, model combines ImageBind and Vicuna to understand and combine multimodal inputs from text, image, audio, depth, thermal, and IMU.
LIMA Less Is More for Alignment. Shows fine-tuning with 1000 carefully curated prompts without reinforcement learning can outperforms GPT-4 in many cases
Gorilla a finetuned LLaMA-based model that surpasses GPT-4 in writing API calls and reduces hallucination. project, code
Voyager Open-Ended Embodied Minecraft Agent using LLMs, project, code
BigTrans llama adapted to multilingual translation over 100 languages, outperforming chatGPT in 8 language-pairs
BPT memory-efficient approach to processing long input sequences in Transformers
Lion efficiently transfers knowledge from a closed-source LLM to an open-source student model
Undetectable Watermarks for Language Models using one-way functions
ALiBi Train Short Test Long. Attention with Linear Biases Enables Input Length Extrapolation. code
The Curse of Recursion: Training on Generated Data Makes Models Forget
Brainformers a complex block for natural language processing that outperforms state-of-the-art Transformers in efficiency and quality
AWQ Activation aware Weight Quantization for better LLM Compression and Acceleration, code
SpQR quantization by Tim Dettmers, code, twitter
InternLM Technical report. A 104B parameters multilingual LLM with SOTA performance in knowledge understanding, reading comprehension, math and coding, outperforms open-source models and ChatGPT in 4 benchmarks
Naive Bayes-based Context Extension NBCE extends context length of LLMs using Naive Bayes to 50k under 8*A100
The Safari of Deep Signal Processing: Hyena and Beyond
Orca Progressive Learning from Complex Explanation Traces of GPT-4. Fine-tunes small models by prompting large foundational models to explain their reasoning steps
How Far Can Camels Go? optimizing instruction on open resources, Tulu models released
FinGPT open-source, accessible and cost efficient re-training for updating financial data inside LLMs for robo-advising, algorithmic trading, and other applications, code, dataset
LongMem proposes new framework, allowing for unlimited context length along with reduced GPU memory usage and faster inference speed. Code
WizardCoder empowers Coding Large Language Models with Evol-Instruct for complex instruction fine-tuning, outperforming open-source and closed LLMs on several benchmarks, github repo, model
Infinigen a procedural generator for foto realistic 3D scenes, based on Blender and running on GPUs, paper, github
Do Large Language Models learn world models or just surface statistics
Large Language Models Can Self-improve, openreview.net
Switch Transformers scaling to Trillion Parameter Models with efficient sparsity, a paper speculated to had an influence on GPT-4's undisclosed architecture using a sparsely activated Mixture of Experts (MoE) architecture
2022 & beyond Algorithms for efficient deep learning Google Research proposed various new architectures to scale LLMs further, including MoE
Wanda Pruning by Weights and Activations a no-retraining pruning method for LLMs requires no retraining and outperforms existing methods, code
Textbooks Are All You Need a 1.3B parameter LLM focusing on programming and coding from Microsoft, which outperforms all models on MBPP except GPT-4, ranks third on HumanEval above GPT-3.5, and exhibits emergent properties
RoPE Enhanced Transformer with Rotary Position Embedding to extend context length
LongChat a new level of extended context length up to 16K tokens, with two released models LongChat-7B and 13B
salesforce xgen a series of 7B LLMs with standard dense attention on up to 8K sequence length for up to 1.5T tokens
LongNet Scaling transformers to 1 billion tokens
Lost in the Middle recent LLMs have longer context and this paper finds that information is best retrieved at the beginning or the end, but mostly lost in the middle of long context
FoT Focused Transformer with contrastive learning to achieve a 256k context length for passkey retrieval, code
OpenLLMs Less is More for Open-source Models, uses only ~6K GPT-4 conversations filtered for quality and achieves SOTA scores on Vicuna GPT-4 eval and AlpacaEval
CoDi Any-to-Any Generation via Composable Diffusion
LEDITS Real Image Editing with DDPM Inversion and Semantic Guidance, demo, code
Mixture of Experts meets Instruction Tuning MoE + Instruction Tuning is a winning combination for LLMs, likely being used for GPT-4
MoE Mixture of Experts LoRA Proof of Concept by AiCrumb, reddit discussion
LLM Attacks Universal and Transferable Adversarial Attacks on Aligned Language Models, code
factool framework for detecting factual errors of texts generated by large language models (e.g., ChatGPT)
codellama Llama 2 fine tuned by meta for code completion, github
Graph of Thoughts introducing Graph of Thoughts and comparing its performance to Chain of Thoughts and Tree of Thoughts, code
LIDA Automatic Generation of Visualizations and Infographics using Large Language Models, code
Distilling step-by-step Outperforming larger language models with less training data and smaller model sizes
LongLoRA Efficient Fine-tuning of Long-Context Large Language Models, code
LLMLingua Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression, code
flagembedding, an embedding model for Retrieve Anything To Augment Large Language Models code
mistral-7b pretrained llm with 7 billion parameters outperforming Llama 2 13B using Grouped-Query-Attention, Sliding-Window Attention and Byte-Fallback BPE tokenizer, weights
CoVe Chain-of-Verification Reduces Hallucination in Large Language Models, implementation in LangChain Expression Language,
MemGPT Towards LLMs as Operating Systems, perpetual chat bots with self editing memory, chat with your SQL database and local files etc, code
microxcaling AMD, Arm, Intel, Meta, Microsoft, NVIDIA, and Qualcomm Standardize Next-Generation Narrow Precision Data Format: Microscaling Data Formats for Deep Learning
AoT Algorithm of Thoughts: Enhancing Exploration of Ideas in LLMs
Chain of Density Prompting From Sparse to Dense: GPT-4 Summarization, gpt-3.5 fine tune rivaling the quality of the original Chain of Density
Self-RAG Learning to Retrieve, Generate and Critique through Self-Reflections outperforming ChatGPT and retrieval-augmented LLama2 Chat on six tasks, selfrag finetuned llama2-13b, mistral-7b finetune
LoRAShear Efficient Large Language Model Structured Pruning and Knowledge Recovery
Making LLaMA SEE and Draw with SEED Tokenizer, Multi Modal fine tune of LLaMA with image generation, image recognition and text generation capabilities, weights, github
BSM Branch-Solve-Merge for LLMs enhancing coherence, planning, and task decomposition outperforming GPT-4 in some tasks
Skeleton-of-Thought Large Language Models Can Do Parallel Decoding. SoT aims at decreasing the end-to-end generation latency of large language models
ML-Bench Large Language Models Leverage Open-source Libraries for Machine Learning Tasks, page, code
QuIP# E8P 2-Bit Quantization of Large Language Models achieving near fp16 quantization performance
HQQ Half-Quadratic Quantization for LLMs significantly accelerating quantization speed without requiring calibration data, outperforming existing methods in processing speed and memory efficiency. Sub 10GB VRAM Mixtral 8x7B implemented through mixtral-offloading, guide
QMoE Practical Sub-1-Bit Compression of Trillion-Parameter Models, code, bitsandbytes sparse_MoE implementation, QMoE in llama.cpp, LoRa experts as alternative to QMoE
mamba alternative to transformer architecture for LLMs using Linear-Time Sequence Modeling with Selective State Spaces code
StreamingLLM Efficient Streaming Language Models with Attention Sinks for bigger Context Windows, code
Chain of Abstraction CoA A New Method for LLMs to Better Leverage Tools in Multi-Step Reasoning
The Era of 1-bit LLMs All Large Language Models are in 1.58 Bits
Large World Model World Model on Million-Length Video and Language with Blockwise RingAttention by UC Berkley
Megalodon Meta's Efficient LLM Pretraining and Inference with Unlimited Context Length
Leave No Context Behind Google's Efficient Infinite Context Transformers with Infini-attention
LongRoPE Extending LLM Context Window Beyond 2 Million Tokens
KAN Kolmogorov-Arnold Networks as promising alternatives to Multi-Layer Perceptrons (MLPs)
Sparse Llama Cerebras and Neural Magic produces a 70% Smaller, 3x Faster, Full Accuracy model, page
OSWorld Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
You Only Cache Once Decoder-Decoder Architectures for Language Models
SOLAR Scaling LLMs with Simple yet Effective Depth Up-Scaling to increase parameter count and continue pre-training, SOLAR-10.7B
TextGrad allows users to enhance AI system components by backpropagating textual feedback provided by LLMs to optimize variables in computation graphs, utilizing a framework similar to PyTorch for various tasks.
Vision language models are blind LLMs with Vision capabilities VLMs perform low on a new benchmark suite called BlindTest that is easy for humans but difficult for VLMs
SpreadsheetLLM Encoding Spreadsheets for Large Language Models, introduces SheetCompressor, an innovative encoding framework for compressing spreadsheets to enhance LLM performance, achieving a state-of-the-art 78.9% F1 score, outperforming existing models.
Internet of Agents creating a flexible and scalable platform for LLM-based multi-agent collaboration using an agent integration protocol, an instant-messaging-like architecture, and dynamic mechanisms for agent teaming and conversation flow control, code available
Mixture-of-Agents MoA proposes a MoA methodology to leverage the strengths of multiple LLMs, achieving state-of-the-art performance using a layered architecture where each agent utilizes outputs from previous layers
RAPTOR Recursive Abstractive Processing for Tree Organized Retrieval is a powerful indexing and retrieving technique clustering and summarizing text chunks in a hierarchical tree structure improving RAG quality significantly
Alice in Wonderland Simple Tasks Showing Complete Reasoning Breakdown in LLMs
EfficientQAT Efficient Quantization-Aware Training down to 2 bits with higher quality than previous methods
Late Chunking introduces "late chunking," which improves the retrieval of smaller portions of text in dense vector-based retrieval systems using long context embedding models, providing superior results across various retrieval tasks without the need for additional training and can be applied to any long-context embedding model

Other awesome resources

LLM Worksheet using an early CoT example by randomfoo2
The full story of LLMs
Brief history of llama models
A timeline of transformer models
Every front-end GUI client for ChatGPT API
LLMSurvey a collection of papers and resources including an LLM timeline
rentry.org/lmg_models a list of llama derrivates and models
Timeline of AI and language models and Model Comparison Sheet by Dr. Alan D. Thompson
Brex's Prompt Engineering Guide an evolving manual providing historical context, strategies, guidelines, and safety recommendations for building programmatic systems on OpenAI's GPT-4
LLMs Practical Guide actively curated collection of a timeline and guides for LLMs, providing a historical context and restrictions based on this paper and community contributions
LLMSurvey based on this paper, builds a collection of further papers and resources related to LLMs including a timeline
LLaMAindex can now use Document Summary Index for better QA performance compared to vectorDBs
ossinsight.io chat-gpt-apps Updated list of top chatGPT related repositories
GenAI_LLM_timeline Organized collection of papers, products, services and news of key events in Generative AI and LLMs with focus on ChatGPT
AIGC-progress an awesome list of all things ml models and projects with daily updates
Things I'm learning while training SuperHOT talks about LiMA, Multi-Instruct and how to extend llama to 8k context size github discussion, reddit discussion
LLM Utils An index of useful LLM related blog posts and tools
Awesome-Multimodal-Large-Language-Models Latest Papers and Datasets on Multimodal Large Language Models, and Their Evaluation.
FourthBrain ML Edication backed by Andrew NG's AI fund, tutorials about LLM deployment, API Endpoint creation, MLOps, QLoRA fine tuning, etc.
companion-app AI Getting Started template for developers using Clerk, Next.js, Pinecone, Langchain.js, OpenAI or Vicuna13b, Twilio
ppromptor Prompt-Promptor is a Python library with a web UI designed to automatically generate and improve prompts for LLMs and consists of three agents: Proposer, Evaluator, and Analyzer. These agents work together with human experts to continuously improve the generated prompts
RAG Guide A Comprehensive Guide for Building RAG-based LLM Applications as a jupyter notebook, HN
RAG is more than just embedding search learnings for building a good RAG-based LLM Application, HN
llm-agent-paper-list The paper list of the 86-page paper "The Rise and Potential of Large Language Model Based Agents: A Survey" by Zhiheng Xi et al., paper
awesome-ai-agents open and closed source agents by categories and industries
Azure OpenAI resources Azure OpenAI, LLMs +🌌 Brief overview,🦙Summary notes,🔎References, and 🎋Cheatsheet
alignment-handbook Huggingface's robust recipes for to align language models with human and AI preferences
llama-recipes Llama 2 demo apps, recipes etc for RAG, Fine tuning, inference etc.
Something-of-THoughts in LLM Prompting Chain-of-Thoughts (CoT), Tree-of-Thoughts (ToT), Graph-of-Thoughts (GoT), and beyond, … What are these thoughts?
GPT-RAG learnings when implementing Azure OpenAI with RAG at scale in a secure manner
AI and Open Source in 2023 a Summary of what happened in 2023 with all the learnings
convert text into graph of concepts Tutorial on how to use Knowledge Based QnA (KBQA) using Knowledge Graphs which can improve RAG context quality in some domains
Generative AI for Beginners 12 Lessons, Get Started Building with Generative AI from Microsoft
LLM Visualization Explaining how transformers work visually using nano-gpt
Visual explanations of core machine learning concepts Visually learn how Neural networks, Regression, Reinforcement Learning, Random Forests and more concepts work
easily train a specialized llm PEFT, LoRA, QLoRA, LLaMA-Adapter, and More
promptbase an evolving collection of resources, best practices, and example scripts for eliciting the best performance from foundation models
rag-survey an updated view on RAG in the wild, their approaches, taxonomy, tech stack and evolution paper
Survey of Reasoning with Foundation Models, awesome reasoning list
llm-course Course to get into LLMs with roadmaps and notebooks covering Fundamentals, LLM-Scientist and LLM-Engineer roles
ML Papers of The Week dair.ai curated list of weekly ML Papers
The Illustrated Transformer Illustrated Guide to Transformers- Step by Step Explanation
ai-exploits A collection of real world AI/ML exploits for responsibly disclosed vulnerabilities
AI Trends features key numbers and data visualizations in AI, related Epoch reports and other sources that showcase the change and growth in AI over time
Awesome-LLM-Inference curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
fuck you, show me the prompt Quickly understand inscrutable LLM frameworks by intercepting API calls.
awesome-generative-ai-guide with up to date links to papers, guides and resources
systematicall improving your rag blog about improving RAG systematically
fabric provides programmatically accessible prompt templates using a crowdsourced prompt DB
Prompt Engineering Guide compilation of various techniques like Few shot, chain of thought, tree of thoughts, tool using, ReAct and more to improve LLM quality via complex prompting
graphrag analysis part 1 shows that Microsoft GraphRAG and graphs may not significantly impact context retrieval
Applied LLMs practical guide to building successful LLM products, covering the tactical, operational, and strategic
engine programmatically access prompt templates based on predefined strategies that include and tools
Transformer-Explainer is an interactive visualization that helps users understand the inner workings of Transformer models, like GPT-2, by providing a detailed breakdown of components such as embeddings, multi-head self-attention, and output probabilities
RAG_Techniques showcases various advanced techniques for RAG systems with source code and explanations
OpenThought - System 2 Research Links a comprehensive collection of resources for researchers and AI developers, compiled from various sources such as books, papers, blog posts, and community contributions, to provide a valuable resource for understanding and improving cognition and reasoning in AI systems
Answering Legal Questions with LLMs great blog post explaining the difficulties creating RAG based law Q&A

Product Showcases

Opinionate.io AI Debating AI
phind.com Developer Search Engine
Voice Q&A Assistant using ChatGPT API, Embeddings, Gradio, Eleven Labs and Whisper
chatpdf, Q&A for PDFs
ai collection collecting startups and SaaS solutions using AI at its core
screenshot-to-code this converts a website screenshot to approximated HTML/CSS code by using GPT-4-Vision
Outfit Anyone Ultra-high quality virtual try-on for Any Clothing and Any Person
llavavision simple "Be My Eyes" web app with a llama.cpp/llava backend explaining what the camera sees for blind assistance
pretzelai modern fork of Jupyter Notebooks with AI code generation and editing, inline tab completion, sidebar chat and error fixing

Benchmarking

Leaderboards

Open LLM Leaderboard by HuggingFace
LMSys Chatbot Arena Leaderboard, blogpost is an anonymous benchmark platform for LLMs that features randomized battles in a crowdsourced manner. Careful: This just measures human preference, not accuracy or other factors
paperswithcode LLM SOTA leaderboards, but usually just for foundation models
Can AI code a self-evaluating interview for AI coding models. code
C-Eval Benchmark Chinese focused LLM Eval Leaderboard
MTEB Leaderboard Massive Text Embedding Benchmark (MTEB) Leaderboard (Vector Embeddings)
hallucination-leaderboard Hughes Hallucination Evaluation Model (HHEM) evaluates how often an LLM introduces hallucinations when summarizing a document code
Big Code Models Leaderboard evaluates base coding models
EvalPlus Leaderboard evaluates AI Coders with rigorous tests
Enterprise Scenarios Leaderboard evaluates the performance of LLMs on real-world enterprise use cases, some of the test sets are closed source to prevent cheating (stale)
NP Hard Eval Leaderboard benchmark for assessing the reasoning abilities of LLMs by using NP Hard problems
Toqan Leaderboard Coding leaderboard with benchmarks for Coding Assistant, Q&A, Summarization, Entity extraction, Function calling and SQL
OpenCompass Leaderboard Leaderboards with specific eval rankings for Medical, General and Law Benchmarks
NIAN Needle in a Needlestack for GPT-4o, GPT-4o-mini, Claude vs others
SEAL Leaderboards Expert-Driven Private Evaluations
AIR-bench Automated Heterogeneous Information Retrieval Benchmark focused on RAG and Retrieval tasks, automatically testing with synthetic random generated tasks
Leaderboards and benchmarks collection of leaderboards and benchmarks for Text, vision, audio etc.
Berkeley Function-Calling Leaderboard Leaderboard of LLMs following function calling instructions
Vision-Arena Leaderboard for benchmarking Multimodal LLMs in the Wild for Vision and Text tasks.
Aider LLM Leaderboard for Code Editing following instructions, not just code generation
RepoQA Leaderboard evaluationg LLMs ability to find specific code in a long context code haystack
BigCodeBench-Hard Leaderboard evaluates LLMs with practical and challenging programming tasks, HF Pages
vellum leaderboard general, coding and long context benchmarks
EQBench a black box closed source and private Emotional Intelligence Benchmark for LLMs
oobabooga benchmark a black box, closed source and private 48 questions benchmark from oobabooga
LiveBench is a dynamic, contamination-free benchmark for Large Language Models (LLMs) that updates regularly to evaluate model performance across diverse tasks, ensuring relevance by refreshing its dataset every 6 months.
Dubesor LLM Benchmark Small-scale manual performance comparison benchmark with closed source questions
LiveCodeBench holistic and Contamination Free Evaluation of LLMs for Code automatically using new LeetCode, AtCoder and Codeforces questions
SWE-bench curated and annotated software development tests for LLMs sourced from 2k real github issues and pull requests, asking LLMs to solve issues in a codebase with an emphasis on understanding and coordinating changes across multiple functions, classes and files simultaneously requiring solutions with code execution environments, long contexts and multi step reasoning that goes beyond code generation

Benchmark Suites

Big-bench a collaborative benchmark featuring over 200 tasks for evaluating the capabilities of llms
Pythia interpretability analysis for autoregressive transformers during training
AlpacaEval automatic evaluation for instruction following LLMs, validated against 20k human annotations, reddit announcement
LMFlow Benchmark automatic evaluation framework for open source LLMs
lm-evaluation-harness framework for few-shot evaluation of autoregressive language models from EleutherAI
sql-eval evaluation of LLM generated SQL queries
ragas RAG assessment: an evaluation framework for Retrieval Augmented Generation pipelines
ToolQA an evaluation framework for RAG and Tool LLM pipelines
LangCheck Simple, Pythonic building blocks to evaluate LLM applications
PromethAI-Memory Open-source framework for building and testing RAGs and Cognitive Architectures, designed for accuracy, transparency, and control
PromptBench a Pytorch-based Python package for Evaluation of LLMs providing APIs
CanItEdit Evaluating the Ability of Large Language Models to Follow Code Editing Instructions, paper
deepeval evaluation framework specialized for unit testing LLM applications based on metrics such as hallucination, answer relevancy, RAGAS, etc.
mlflow llm-evaluate use-case specific standard metrics and custom metrics, optional ground truth
AgentBoard Evaluation Board of Multi-turn LLM Agents
LLM-Uncertainty-Bench Benchmarking LLMs via Uncertainty Quantification
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets
PHUDG3 Phi-3 as Scalable Judge. Evaluate your LLMs with an LLM
NIAN Needle in a Needlestack because LLMs have improved and Needle in a Haystack has become too easy
beyondllm all-in-one toolkit for observability, experimentation, evaluation, and deployment of Retrieval-Augmented Generation (RAG) systems
AIR-bench Automated Heterogeneous Information Retrieval Benchmark focused on RAG and Retrieval tasks, automatically testing with synthetic random generated tasks
LLMSuite view code, run inferences, and measure performance with evaluation tasks
BlindTest Vision Language Model (VLM) benchmark to assess visual understanding capabilities
RepoQA Evaluating Long-Context Code Understanding
BICS Bug In the Code Stack benchmark measuring LLMs capability to detect bugs in large codebases similar to needle in the haystack benchmarks using randomly assembled python source code as background noise and syntactic bug as the needle in very large long context windows
BABILong long-context needle-in-a-haystack benchmark for LLMs for text based tasks
ARES Automated Evaluation Framework for RAG Systems combining synthetic data generation with fine-tuned classifiers to efficiently assess context relevance, answer faithfulness, and answer relevance, minimizing the need for extensive human annotations
RULER is a benchmark to evaluate the effective context size of long-context language models by generating synthetic examples and measuring performance across different tasks, revealing real capabilities versus claimed specs.
paramount is a tool for AI developers and experts that records LLM agent inputs and outputs for quality assurance, ground truth capturing, and automated regression testing, operating offline in a private environment, to allow continuous monitoring and improvement
LiveCodeBench a contamination-free benchmark for coding capabilities automatically getting new LeetCode, AtCoder and CodeForces questions, with tasks such as code generation, code execution, and test output prediction

AI DevOps

Vicuna FastChat
SynapseML (previously known as MMLSpark),an open-source library that simplifies the creation of massively scalable machine learning (ML) pipelines
Colossal-AI unified deep learning system that provides a collection of parallel components for distributed deep learning models. Provides data parallelism, pipeline parallelism, and tensor parallelism
OpenLLM Run, deploy, and monitor open-source LLMs on any platform
skypilot Run LLMs, AI, and Batch jobs on any cloud. Get maximum savings, highest GPU availability, and managed execution
ONNX Runtime cross-platform inference and training machine-learning accelerator compatible with PyTorch, TensorFlow/Keras, scikit-learn, LightGBM, XGBoost, etc. and runs with different hardware, drivers, and operating systems
vllm high-throughput and memory-efficient inference and serving engine for LLMs, paper
openllmetry observability for your LLM application, based on OpenTelemetry
DeepSpeed-FastGen High-throughput Text Generation for LLMs at 2x vLLM speeds
DeepSparse Sparsity-aware deep learning inference runtime for CPUs
dvc ML Experiments Management with Git
S-LoRA Serving Thousands of Concurrent LoRA Adapters
PowerInfer Fast LLM Serving with a Consumer-grade GPU leveraging activation locality, PR on llama.cpp, issue on ollama
TaskingAI open source platform for AI-native application development
inferflow LLM inference serving engine with support for Multi-GPU, Quantization supporting gguf, llama2, safetensors and many model families
[LMDeploy](https://github.com/InternLM/lmdeploy multi-model, multi-machine, multi-card inference service for many models
powerinfer High-speed Model Inference Serving on Consumer GPU/CPU using activation locality for hot/cold neurons
lorax Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
Geniusrise AI microservices framework & ecosystem. Host inference APIs, schedule bulk inference and fine tune text, vision, audio and multi-modal models.
node-llmatic self-hosted LLMs with an OpenAI compatible API
Nitro - Embeddable AI An inference server on top of llama.cpp. OpenAI-compatible API, queue, & scaling. Embed a prod-ready, local inference engine in your apps. Powers Jan
gateway Robust cloud-native AI Gateway and LLMOps infrastructure stack with routing, load balancing, fallback, analytics, caching, PII filter
pytorch-lightning Pretrain, finetune and deploy AI models on multiple GPUs, TPUs with zero code changes
text-generation-inference Huggingface's own Rust, Python and gRPC server for text gen inference providing an API endpoint, supporting Flash/Paged Attention, bitsandbytes, GPTQ, EETQ, AWQ, Logits, Logprobs, Speculation, Guidance
mistral.rs a fast LLM inference platform supporting inference on a variety of devices, quantization, multi modal models, with an Open-AI API compatible HTTP server and Python bindings and AnyMoE for memory efficient MoE model from anything
Text Generation Inference deploy and serve popular LLMs with high-performance text generation, featuring optimizations like Tensor Parallelism, continuous batching, and quantization for efficient inference for GPUs and CPUs
gateway local proxy and API multi model server with fallbacks, retries, load balancing
litellm Use OpenAI API call format for any LLM backend (Local, Huggingface, Cohere, TogetherAI, Azure, Ollama, Replicate, Sagemaker, Anthropic, etc) as a load balancer
LLaMbA Large Language Model Batching Application built on asp.net core and llamasharp for minimalistic corss platform batching is a serving engine not for end users consumption but for applicartions needing fast text generation for classification, synthetic data generation etc.
paddler production ready stateful load balancer and reverse proxy to serve llama.cpp supporting balancing strategies like slots, providing monitoring agents for multiple llama.cpp instances, dynamic addition and removal of instances, autoscaling, buffers, dashboard
harbor Docker based containerized LLM Toolkit to run backends, apis and frontends concisely via CLI with configuration management and deployment
tabbyAPI OpenAI API compatible LLM server using exllamav2 API that's both lightweight and fast
aphrodite-engine bartch inference engine providing an OpenAI compatible API with Paged Attention, continuous batching, distributed inference, various sampling methods, K/V management and support for AQLM, AWQ, BnB, EXL2, GGUF, GPTQ, QuIP, Smoothquant+ and SqueezeLLM quantization support
infinity high throughput low latency vector embeddings engine porivind an OpenAI compatible API supporting wide range of text-embedding models, reranking models, clip models
text-embedding-inference native and docker available TEI huggingface supporting a wide range of embedding models like LLM based gte, bert, roBERTa, NomicBert and JinaBERT type models and rerankers like XLM-RoBERTa such as bge-reranker-large with support for GPU and CPU inference
Xorbits Inference Xinference model server supports LLM, text embedding, Speech recognition, multimodal and text to image inference with GPU, CPU and apple silicon MLX hardware support, transformers continuous batching, LoRA, vLLM integration, OpenAI Compatible API, multi node deployment and function calling via native pip deployment, docker and K8s
SGLang fast serving framework for LLMs and vision LMs using fast radixAttention for caching, continuous batching, paged attention, tensor parallelism and quantization like AWQ, FP8, GPTQ on GPU only inference via native pip deployment or docker
RouteLLM serving and evaluating LLM routers to find optimal cost vs. quality depending on the query
langfuse LLM Observability, monitoring, evaluation, analytics, prompt management, playground
LitServe easy, flexible and enterprise scale serving engine to deploy any ML, embedding, language, vision or audio model with support for batching, streaming and GPU autoscaling
LitGPT easy, flexible and enterprise scale finetune, pretrain, deploy and serving of LLMs
Nexa-SDK toolkit for local ONNX and GGML model deployment for Text Generation, Image Generation, VLMs, TTS and STT and an OpenAI compatible API server with JSON schema mode, function calling and streaming support and a Streamlit UI and its own Model Hub / Zoo
lmnr is an open-source platform for engineering LLM products, providing functionalities such as tracing, evaluating, annotating, and analyzing LLM data, built with modern tech stack including Rust, RabbitMQ, Postgres, and Clickhouse, offering insights similar to DataDog + PostHog for LLM apps
exo petals inspired decentralized LLM inference using multiple commodity devices like laptops and phones to split up a larger model to do inference on smaller devices and communicate using P2P with autodiscovery

Optimization

Petals
FlexGen High-throughput Generative Inference of LLMs with a Single GPU
XLA Accelerated Linear Algebra is a ML compiler for GPU, CPU and accelerators
zipslicer
AITemplate a Python framework which renders neural network into high performance CUDA/HIP C++ code
Flash-attention Fast and memory-efficient exact attention
tokenmonster ungreedy tokenizer increases inference speed and context-length by 35% for pre-training on new LLMs
LOMO fuses the gradient computation and the parameter update in one step to reduce memory usage enables the full parameter fine-tuning of a 7B model on a single RTX 3090
GPTFast a set of techniques developed by the PyTorch Team to accelerate the inference speed of huggingface transformer models
KTransformers KTransformers (QuickTransformers) is a framework for cutting-edge LLM Inference Optimizations
Optimum Huggingface's accelerated traning and inference library for Transformers and Diffusers supporting onnx, intel NPU, openVINO, TensorRT, AMD NPU and cloud Hardware and features graph optimization, post training quantization, quantized training with QAT, pruning and knowledge distillation

Databases for ML

Pinecone proprietary vector search for semantic search, recommendations and information retrieval
FAISS Library for Efficient Similarity Search and Clustering using vectors
Weaviate open source vector DB for services like OpenAI, HF etc for text, image, Q&A etc.
vespa.ai one of the only scalable vector DBs that supports multiple vectors per schema field
LanceDB free open-source serverless vector DB with support for langchain, llamaindex and multi-modal data
Deeplake Vector Database for audio, text, vectors, video
milvus open-source cloud-native vector DB focusing on embedding vectors converted from unstructured data
chroma open-source embedding database
pgvector open-source vector similarity search for Postgres.
chromem-go embeddable vector database for Go with Chroma-like interface and zero third-party dependencies. In-memory with optional persistence.
txtai All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows
mindsdb database for datascience and AI centered workloads like local LLM / OpenAI models access, text embeddings, forecasting etc.
haystackdb on disk vector db which is 10x faster than FAISS in memory
vector-admin universal tool suite for vector database management. Manage Pinecone, Chroma, Qdrant, Weaviate and more vector databases with ease
vectordb a simple, lightweight, local end to end DB for embeddings-based text retrieval
sqlite-vec SQLite extension written in C without any dependencies for vector search support in SQLite

Safety, Responsibility and Red Teaming

PyRIT Python Risk Identification Tool for generative AI to automatically red team foundation models and apps
PurpleLlama Cyber Security Eval, Llama Guard and Code Shield to assess and improve LLM security
Promptfoo for testing, evaluating, and red-teaming LLM applications, allowing users to systematically compare LLM outputs, identify vulnerabilities, and improve prompt quality using declarative test cases and a command-line interface for integration into CI/CD workflows

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm-tools.md

llm-tools.md

Open LLM Models

Tools

Native GUIs

Web GUIs

Backends

Voice Assistants

Retrieval Augmented Generation (RAG)

Browser Extensions

Agents / Automatic GPT

Multi Modal

Code generation

Libraries and Wrappers

Prompt templating / Grammar

Fine Tuning & Training

Merging & Quantization

Resources

Data sets

Research

Other awesome resources

Product Showcases

Benchmarking

Leaderboards

Benchmark Suites

AI DevOps

Optimization

Databases for ML

Safety, Responsibility and Red Teaming

Files

llm-tools.md

Latest commit

History

llm-tools.md

File metadata and controls

Open LLM Models

Tools

Native GUIs

Web GUIs

Backends

Voice Assistants

Retrieval Augmented Generation (RAG)

Browser Extensions

Agents / Automatic GPT

Multi Modal

Code generation

Libraries and Wrappers

Prompt templating / Grammar

Fine Tuning & Training

Merging & Quantization

Resources

Data sets

Research

Other awesome resources

Product Showcases

Benchmarking

Leaderboards

Benchmark Suites

AI DevOps

Optimization

Databases for ML

Safety, Responsibility and Red Teaming