✨ Access 40k+ open-source and proprietary AI models through a standard API. Achieve GPU-backed performance at CPU pricing ✨
- Introduction
- Quickstart
- API Documentation
- Libraries
- Docker
- Capabilities
- Proprietary Models
- Pricing
- API Playground
- Status
- Resources
- Feedback
Bytez Model API streamlines integration with 40k+ open-source and proprietary AI models across 33 ML tasks. By standardizing inputs for text
, images
, audio
, and more, it eliminates the complexity of inconsistent formats, enabling developers to effortlessly interact with models for tasks like chat
, text generation
, image generation
, video generation
, and beyond.
Get your API Key by signing up on Bytez, then navigating to Settings in your account.
Validate by running an inference:
curl --location 'https://api.bytez.com/models/v2/openai-community/gpt2' \
--header 'Authorization: Key BYTEZ_API_KEY' \
--header 'Content-Type: application/json' \
--data '{
"text": "Dreams are messages from the "
}'
- Log into Bytez.
- Navigate to the
Settings
page. - Locate your API key under the API Keys section and copy it.
Use this key in the Authorization
header for all API requests:
Authorization: Key your-key-here
You can use a curl
command to verify your setup:
curl --location 'https://api.bytez.com/models/v2/NousResearch/Hermes-3-Llama-3.1-8B' \
--header 'Authorization: Key BYTEZ_API_KEY' \
--header 'Content-Type: application/json' \
--data '{
"messages": [
{
"role": "system",
"content": "You'\''re a helpful assistant"
},
{
"role": "user",
"content": "Dreams are messages from the "
}
]
}'
Something not right or need another API Key? DM our team in Discord and we'll resolve.
You can interact with proprietary chat models by OpenAI
, Anthropic
, Cohere
, Google
, and Mistral
To use these models, you'll need two keys:
- Your
Bytez API Key
: Obtained as described above. Provider Key
: The key specific to the provider you want to access (e.g., OpenAI API key).
Example Headers
Authorization: Key your-bytez-api-key
Provider-key: your-provider-key
No Additional Charges
: Bytez does not charge for accessing proprietary models; however, the respective provider's billing applies.Seamless Integration
: You can interact with closed-source models using the same standardized input structure as open-source models.
from bytez import Bytez
client = Bytez("YOUR_BYTEZ_KEY_HERE")
## List all models
models = client.list_models()
println(models)
## List models by task
models_by_task = client.list_models("object-detection")
println(models_by_task)
import Bytez from "bytez.js";
const client = new Bytez("YOUR_BYTEZ_KEY_HERE");
// List all models
const models = await client.list.models.all();
console.log(models);
// List models by task
const modelsByTask = await client.list.models.byTask("object-detection");
console.log(modelsByTask);
using Bytez
client = Bytez("YOUR_BYTEZ_KEY_HERE")
model_list = client.list_models()
println(model_list)
curl --location 'https://api.bytez.com/models/v2/list/models' \
--header 'Authorization: Key YOUR_BYTEZ_KEY_HERE'
curl --location 'https://api.bytez.com/models/v2/list/models?task=chat' \
--header 'Authorization: Key YOUR_BYTEZ_KEY_HERE'
We have an API playground to demo over 40k models across 33 tasks. Or, feel free to play with models on the Bytez platform.
Using Python 3.9+
, JavaScript
, or Julia
, install the appropriate package:
pip install bytez
npm install bytez.js
// Run the command julia
// Press ]
// Run the command below
add Bytez
from bytez import Bytez
client = Bytez("YOUR_BYTEZ_KEY_HERE")
model = client.model("Qwen/Qwen2-7B-Instruct")
model.load()
input_text = "Dreams are messages from the "
model_params = {"max_new_tokens": 20, "max_new_tokens": 5, "temperature": 0.5}
result = model.run(input_text, model_params=model_params)
output = result["output"]
generated_text = output[0]["generated_text"]
print(generated_text)
import Bytez from "bytez.js";
const client = new Bytez("YOUR_BYTEZ_KEY_HERE");
const model_id = "openai-community/gpt2";
const model = client.model("openai-community/gpt2");
await model.load();
const output = await model.run("Dreams are messages from the ", {
max_new_tokens: 20,
min_new_tokens: 5
});
console.log(output);
using Bytez
client = Bytez("YOUR_BYTEZ_KEY_HERE")
model = client.model("Qwen/Qwen2-7B-Instruct")
model.load()
input_text = "Dreams are messages from the "
options = Dict(
"params" => Dict(
"max_new_tokens" => 20,
"min_new_tokens" => 5,
"temperature" => 0.5,
)
)
result = model.run(input_text, options)
output = result["output"]
generated_text = output[1]["generated_text"]
println(generated_text)
All Bytez model images are available on Docker Hub, models can be played with via our Models page 🤙
The source code that runs for a given model in the docker image can be found here
Generate text with chat models using structured inputs.
import Bytez from "bytez.js";
const client = new Bytez("YOUR_BYTEZ_KEY_HERE");
const messages = [
{
role: "system",
content: "You are a friendly chatbot",
},
{
role: "user",
content: "What is the capital of England?",
},
];
const model = client.model("microsoft/Phi-3-mini-4k-instruct");
await model.load();
const { output } = await model.run(messages, { max_length: 100 });
const [{ generated_text }] = output;
for (const message of generated_text) {
console.log(message);
const { content, role } = message;
console.log({ content, role });
}
Full documentation here
Use chat models with images as input to generate text-based responses.
const Bytez = require("bytez.js");
const client = new Bytez("YOUR_BYTEZ_KEY_HERE");
const model = client.model("meta-llama/Llama-3.2-11B-Vision-Instruct");
await model.load();
const textInput = [
{
role: "system",
content: [{ type: "text", text: "You are a helpful assistant." }]
},
{
role: "user",
content: [
{ type: "text", text: "What is this image?" },
{ type: "image", url: "https://hips.hearstapps.com/hmg-prod/images/how-to-keep-ducks-call-ducks-1615457181.jpg?crop=0.670xw:1.00xh;0.157xw,0&resize=980:*" }
]
}
];
const { output } = await model.run(textInput);
console.log(output);
Full documentation here
Use chat models with video input to generate insightful responses.
const Bytez = require("bytez.js");
const client = new Bytez("YOUR_BYTEZ_KEY_HERE");
const model = client.model("llava-hf/LLaVA-NeXT-Video-7B-hf");
await model.load();
const textInput = [
{
role: "system",
content: [{ type: "text", text: "You are a helpful assistant." }]
},
{
role: "user",
content: [
{ type: "text", text: "Why is this video funny?" },
{ type: "video", url: "https://example.com/path-to-video.mp4" }
]
}
];
const { output } = await model.run(textInput);
console.log(output);
Full documentation here
Process and analyze audio inputs with chat models.
const Bytez = require("bytez.js");
const client = new Bytez("YOUR_BYTEZ_KEY_HERE");
const model = client.model("Qwen/Qwen2-Audio-7B-Instruct");
await model.load();
const textInput = [
{
role: "system",
content: [{ type: "text", text: "You are a helpful assistant." }]
},
{
role: "user",
content: [
{ type: "text", text: "What sound is this?" },
{ type: "audio", url: "https://example.com/path-to-audio.mp3" }
]
}
];
const { output } = await model.run(textInput);
console.log(output);
Full documentation here
Generate images using Bytez API with base64
or URL
inputs.
import Bytez from "bytez.js";
import { dirname } from "path";
import { fileURLToPath } from "url";
import { writeFileSync } from "node:fs";
const __filename = fileURLToPath(import.meta.url);
const __dirname = dirname(__filename);
const client = new Bytez("YOUR_BYTEZ_KEY_HERE");
const model = client.model("dreamlike-art/dreamlike-photoreal-2.0");
await model.load();
const { output_png } = await model.run(
"A beautiful landscape with mountains and a river"
);
const buffer = Buffer.from(output_png, "base64");
// Write the image to the local file system
writeFileSync(`${__dirname}/output.png`, buffer);
Generate text
and vector
embeddings
import Bytez from "bytez.js";
const client = new Bytez("API_KEY");
// 1) Select the model
const model = client.model("nomic-ai/nomic-embed-text-v1.5");
// 2) Load the model
await model.load();
// 3) Run the model
const output = await model.run("Once upon a time");
console.log(output);
Full documentation here
Execute code
or actions
based on model-generated outputs
import Bytez from "bytez.js";
const client = new Bytez("YOUR_BYTEZ_KEY_HERE");
const inputText = "What's the weather like in Seattle right now?";
const modelParams = {
max_new_tokens: 2000,
min_new_tokens: 50,
temperature: 0.001,
do_sample: false
};
const promptTemplate = `
Function:
def get_weather_data(coordinates):
"""
Fetches weather data from the Open-Meteo API for the given latitude and longitude.
Args:
coordinates (tuple): The latitude and longitude of the location.
Returns:
float: The current temperature in the coordinates you've asked for
"""
Function:
def get_coordinates_from_city(city_name):
"""
Fetches the latitude and longitude of a given city name using the Maps.co Geocoding API.
Args:
city_name (str): The name of the city.
Returns:
tuple: The latitude and longitude of the city.
"""
User Query: {query}<human_end>
`;
const model = client.model("Nexusflow/NexusRaven-V2-13B");
await model.load();
const prompt = promptTemplate.replace("{query}", inputText);
const stream = await model.run(prompt, { stream: true, params: modelParams });
const textStream = stream.pipeThrough(new TextDecoderStream());
for await (const chunk of textStream) {
console.log(chunk);
}
Full documentation here
Streaming allows you to receive model outputs incrementally as soon as they are available, which is ideal for tasks like real-time responses or large outputs.
To enable streaming, pass true
as the third argument to the model.run()
function. The model will return a stream that you can read incrementally.
const stream = await model.run(textInput, params, true);
const { Readable } = require('stream');
const stream = await model.run(textInput, params, true);
try {
const readableStream = Readable.fromWeb(stream); // Convert Web Stream to Node.js Readable Stream
for await (const chunk of readableStream) {
console.log(chunk.toString()); // Handle each chunk of data
}
} catch (error) {
console.error(error); // Handle errors
}
const stream = await model.run(textInput, params, true);
try {
const reader = stream.getReader(); // Get a reader for the Web Stream
while (true) {
const { done, value } = await reader.read(); // Read the stream chunk-by-chunk
if (done) break; // Exit when the stream ends
console.log(new TextDecoder().decode(value)); // Convert Uint8Array to string
}
} catch (error) {
console.error(error); // Handle errors
}
Our API provides access to a wide range of pretrained models across 33 machine learning tasks, each tailored to specific applications like summarization
, document question-answering
, audio classification
, and more.
Explore the full list of tasks here.
Our v2 endpoint supports interacting with proprietary models from Anthropic
, Google
, Cohere
, OpenAI
, and Mistral
.
curl --location 'https://api.bytez.com/models/v2/openai/gpt-4o-mini' \
--header 'Authorization: Key YOUR_BYTEZ_KEY_HERE' \
--header 'Provider-Key: PROVIDER_KEY' \
--header 'Content-Type: application/json' \
--data '{
"messages": [{"role": "user", "content": "Hello my name is Bob and I like to eat"}],
"stream": false,
"params": { "max_tokens": 100 }
}'
curl --location 'https://api.bytez.com/models/v2/google/gemini-1.5-flash' \
--header 'Authorization: Key YOUR_BYTEZ_KEY_HERE' \
--header 'Provider-Key: PROVIDER_KEY' \
--header 'Content-Type: application/json' \
--data '{
"messages": [{"role": "user", "content": "Hello my name is Bob and I like to eat"}],
"stream": false,
"params": { "temperature": 1 }
}'
curl --location 'https://api.bytez.com/models/v2/cohere/command-r' \
--header 'Authorization: Key YOUR_BYTEZ_KEY_HERE' \
--header 'Provider-Key: PROVIDER_KEY' \
--header 'Content-Type: application/json' \
--data '{
"messages": [{"role": "user", "content": "Cats and rabbits who reside in fancy little houses"}],
"stream": false,
"params": { "max_tokens": 50 }
}'
curl --location 'https://api.bytez.com/models/v2/mistral/mistral-small-latest' \
--header 'Authorization: Key YOUR_BYTEZ_KEY_HERE' \
--header 'Provider-Key: PROVIDER_KEY' \
--header 'Content-Type: application/json' \
--data '{
"messages": [{"role": "user", "content": "Cats and rabbits who reside in fancy little houses"}],
"stream": false,
"params": { "max_tokens": 50 }
}'
curl --location 'https://api.bytez.com/models/v2/anthropic/claude-3-haiku-20240307' \
--header 'Authorization: Key YOUR_BYTEZ_KEY_HERE' \
--header 'Provider-Key: PROVIDER_KEY' \
--header 'Content-Type: application/json' \
--data '{
"messages": [{"role": "user", "content": "Cats and rabbits who reside in fancy little houses"}],
"stream": false,
"params": { "max_tokens": 50 }
}'
Inference pricing for models is designed to be straightforward and predictable. Instead of relying on complex token-based pricing (which doesn't make sense for non-text-generation models), we calculate costs based on Inference Meter Price
and Time to First Inference
.
Pricing = Meter Price × Inference Time
- Models run on instances optimized for RAM usage.
- Instances are categorized by size (e.g.,
Micro
,Small
,Super
). - LLMs (Large Language Models) have their own specific pricing meters.
Each API response includes:
Inference Meter
Inference Meter Price
Inference Time
Inference Cost
Instance Size | GPU RAM (GB) | Inference Meter Price ($/sec) |
---|---|---|
Micro | 16 | 0.0000872083 |
XS | 24 | 0.0001475035 |
SM | 64 | 0.0006478333 |
MD | 96 | 0.0008433876 |
LG | 128 | 0.0012956667 |
XL | 192 | 0.0024468774 |
XXL | 320 | 0.0047912685 |
Super | 640 | 0.0059890856 |
Instance Size | GPU RAM (GB) | Inference Meter Price ($/sec) |
---|---|---|
Micro | 16 | 0.00053440 |
XS | 24 | 0.00066800 |
SM | 64 | 0.00427520 |
MD | 96 | 0.00480960 |
LG | 128 | 0.00855040 |
XL | 192 | 0.01603200 |
XXL | 320 | 0.02458240 |
Super | 640 | 0.02992640 |
Explore our API endpoints in the documentation here.
Check out the status of our API
Get to know our story, our mission, and our roadmap here.
We’re committed to building the best developer experience for AI builders. Have feedback? Let us know on Discord or open an issue on GitHub.