A fast inference service for serving and querying Large Language Models (LLMs)
kaleidoscope
provides a few high-level APIs, namely:
model_instances
- Shows a list of all active LLMs instantiated by the model serviceload_model
- Loads an LLM via the model servicegenerate
- Returns an LLM text generation based on prompt input
kaleidoscope
is composed of the following components:
- Python SDK - A frontend Python library for interacting with LLMs, available in a separate repository at https://github.com/VectorInstitute/kaleidoscope-sdk
- Model Service - A backend utility that loads models into GPU memory and exposes an interface to recieve requests
- Gateway Service - A controller service that interfaces between the frontend user tools and model service
Instructions for setting up gateway service.
git clone https://github.com/VectorInstitute/kaleidoscope.git
cp kaleidoscope/web/.env-example kaleidoscope/web/.env
sudo docker compose -f kaleidoscope/web/docker-compose.yaml up
The Kaleidoscope SDK toolkit is a Python module that provides a programmatic interface for interfacing with the services found here. You can download and install the SDK from its own repository: https://github.com/VectorInstitute/kaleidoscope-sdk
Contributing to kaleidoscope is welcomed. See Contributing for guidelines.
Reference to cite when you use Kaleidoscope in a project or a research paper:
Sivaloganathan, J., Coatsworth, M., Willes, J., Choi, M., & Shen, G. (2022). Kaleidoscope. http://VectorInstitute.github.io/kaleidoscope. computer software, Vector Institute for Artificial Intelligence. Retrieved from https://github.com/VectorInstitute/kaleidoscope.git.