Bytez Model API

✨ Access 40k+ open-source and proprietary AI models through a standard API. Achieve GPU-backed performance at CPU pricing ✨

Introduction

Bytez Model API streamlines integration with 40k+ open-source and proprietary AI models across 33 ML tasks. By standardizing inputs for text, images, audio, and more, it eliminates the complexity of inconsistent formats, enabling developers to effortlessly interact with models for tasks like chat, text generation, image generation, video generation, and beyond.

Quickstart

Get your API Key by signing up on Bytez, then navigating to Settings in your account.

Validate by running an inference:

REST

curl --location 'https://api.bytez.com/models/v2/openai-community/gpt2' \
--header 'Authorization: Key BYTEZ_API_KEY' \
--header 'Content-Type: application/json' \
--data '{
    "text": "Dreams are messages from the "
}'

API Documentation

Get a Bytez API Key

Log into Bytez.
Navigate to the Settings page.
Locate your API key under the API Keys section and copy it.

Use this key in the Authorization header for all API requests:

Authorization: Key your-key-here

Send Your First API Request

You can use a curl command to verify your setup:

curl --location 'https://api.bytez.com/models/v2/NousResearch/Hermes-3-Llama-3.1-8B' \
--header 'Authorization: Key BYTEZ_API_KEY' \
--header 'Content-Type: application/json' \
--data '{
    "messages": [
        {
            "role": "system",
            "content": "You'\''re a helpful assistant"
        },
        {
            "role": "user",
            "content": "Dreams are messages from the "
        }
    ]
}'

Something not right or need another API Key? DM our team in Discord and we'll resolve.

Accessing Closed Source Models

You can interact with proprietary chat models by OpenAI, Anthropic, Cohere, Google, and Mistral

To use these models, you'll need two keys:

Your Bytez API Key: Obtained as described above.
Provider Key: The key specific to the provider you want to access (e.g., OpenAI API key).

Example Headers

Authorization: Key your-bytez-api-key
Provider-key: your-provider-key

Notes

No Additional Charges: Bytez does not charge for accessing proprietary models; however, the respective provider's billing applies.
Seamless Integration: You can interact with closed-source models using the same standardized input structure as open-source models.

List Models & Tasks

Python

from bytez import Bytez

client = Bytez("YOUR_BYTEZ_KEY_HERE")

## List all models
models = client.list_models()
println(models)

## List models by task
models_by_task = client.list_models("object-detection")
println(models_by_task)

Javascript

import Bytez from "bytez.js";

const client = new Bytez("YOUR_BYTEZ_KEY_HERE");
// List all models
const models = await client.list.models.all();
console.log(models);
// List models by task
const modelsByTask = await client.list.models.byTask("object-detection");

console.log(modelsByTask);

Julia

using Bytez
client = Bytez("YOUR_BYTEZ_KEY_HERE")
model_list = client.list_models()

println(model_list)

REST

curl --location 'https://api.bytez.com/models/v2/list/models' \
--header 'Authorization: Key YOUR_BYTEZ_KEY_HERE'

curl --location 'https://api.bytez.com/models/v2/list/models?task=chat' \
--header 'Authorization: Key YOUR_BYTEZ_KEY_HERE'

Play with Trending Models on Bytez

We have an API playground to demo over 40k models across 33 tasks. Or, feel free to play with models on the Bytez platform.

Libraries

Using Python 3.9+, JavaScript, or Julia, install the appropriate package:

Python

pip install bytez

Javascript

npm install bytez.js

Julia

// Run the command julia
// Press ]
// Run the command below
add Bytez

Run Inference

Python

from bytez import Bytez

client = Bytez("YOUR_BYTEZ_KEY_HERE")
model = client.model("Qwen/Qwen2-7B-Instruct")
model.load()

input_text = "Dreams are messages from the "
model_params = {"max_new_tokens": 20, "max_new_tokens": 5, "temperature": 0.5}

result = model.run(input_text, model_params=model_params)
output = result["output"]
generated_text = output[0]["generated_text"]
print(generated_text)

JavaScript

import Bytez from "bytez.js";

const client = new Bytez("YOUR_BYTEZ_KEY_HERE");
const model_id = "openai-community/gpt2";
const model = client.model("openai-community/gpt2");

await model.load();

const output = await model.run("Dreams are messages from the ", {
  max_new_tokens: 20,
  min_new_tokens: 5
});

console.log(output);

Julia

using Bytez

client = Bytez("YOUR_BYTEZ_KEY_HERE")

model = client.model("Qwen/Qwen2-7B-Instruct")

model.load()

input_text = "Dreams are messages from the "

options = Dict(
	"params" => Dict(
		"max_new_tokens" => 20,
		"min_new_tokens" => 5,
		"temperature" => 0.5,
	)
)

result = model.run(input_text, options)
output = result["output"]
generated_text = output[1]["generated_text"]
println(generated_text)

Docker

All Bytez model images are available on Docker Hub, models can be played with via our Models page 🤙

Image Source Code

The source code that runs for a given model in the docker image can be found here

Capabilities

Chat

Generate text with chat models using structured inputs.

Javascript

import Bytez from "bytez.js";
const client = new Bytez("YOUR_BYTEZ_KEY_HERE");

const messages = [
  {
    role: "system",
    content: "You are a friendly chatbot",
  },
  {
    role: "user",
    content: "What is the capital of England?",
  },
];

const model = client.model("microsoft/Phi-3-mini-4k-instruct");

await model.load();

const { output } = await model.run(messages, { max_length: 100 });
const [{ generated_text }] = output;

for (const message of generated_text) {
  console.log(message);
  const { content, role } = message;
  console.log({ content, role });
}

Full documentation here

Chat + Vision

Use chat models with images as input to generate text-based responses.

Javascript

const Bytez = require("bytez.js");
const client = new Bytez("YOUR_BYTEZ_KEY_HERE");

const model = client.model("meta-llama/Llama-3.2-11B-Vision-Instruct");
await model.load();

const textInput = [
  {
    role: "system",
    content: [{ type: "text", text: "You are a helpful assistant." }]
  },
  {
    role: "user",
    content: [
      { type: "text", text: "What is this image?" },
      { type: "image", url: "https://hips.hearstapps.com/hmg-prod/images/how-to-keep-ducks-call-ducks-1615457181.jpg?crop=0.670xw:1.00xh;0.157xw,0&resize=980:*" }
    ]
  }
];

const { output } = await model.run(textInput);
console.log(output);

Full documentation here

Chat + Video

Use chat models with video input to generate insightful responses.

Javascript

const Bytez = require("bytez.js");
const client = new Bytez("YOUR_BYTEZ_KEY_HERE");

const model = client.model("llava-hf/LLaVA-NeXT-Video-7B-hf");
await model.load();

const textInput = [
  {
    role: "system",
    content: [{ type: "text", text: "You are a helpful assistant." }]
  },
  {
    role: "user",
    content: [
      { type: "text", text: "Why is this video funny?" },
      { type: "video", url: "https://example.com/path-to-video.mp4" }
    ]
  }
];

const { output } = await model.run(textInput);
console.log(output);

Full documentation here

Chat + Audio

Process and analyze audio inputs with chat models.

Javascript

const Bytez = require("bytez.js");
const client = new Bytez("YOUR_BYTEZ_KEY_HERE");

const model = client.model("Qwen/Qwen2-Audio-7B-Instruct");
await model.load();

const textInput = [
  {
    role: "system",
    content: [{ type: "text", text: "You are a helpful assistant." }]
  },
  {
    role: "user",
    content: [
      { type: "text", text: "What sound is this?" },
      { type: "audio", url: "https://example.com/path-to-audio.mp3" }
    ]
  }
];

const { output } = await model.run(textInput);
console.log(output);

Full documentation here

Image Generation

Generate images using Bytez API with base64 or URL inputs.

Javascript

import Bytez from "bytez.js";
import { dirname } from "path";
import { fileURLToPath } from "url";
import { writeFileSync } from "node:fs";

const __filename = fileURLToPath(import.meta.url);
const __dirname = dirname(__filename);

const client = new Bytez("YOUR_BYTEZ_KEY_HERE");

const model = client.model("dreamlike-art/dreamlike-photoreal-2.0");

await model.load();

const { output_png } = await model.run(
  "A beautiful landscape with mountains and a river"
);

const buffer = Buffer.from(output_png, "base64");

// Write the image to the local file system
writeFileSync(`${__dirname}/output.png`, buffer);

Embeddings

Generate text and vector embeddings

import Bytez from "bytez.js";

const client = new Bytez("API_KEY");

// 1) Select the model
const model = client.model("nomic-ai/nomic-embed-text-v1.5");

// 2) Load the model
await model.load();

// 3) Run the model
const output = await model.run("Once upon a time");

console.log(output);

Full documentation here

Function Calling

Execute code or actions based on model-generated outputs

import Bytez from "bytez.js";

const client = new Bytez("YOUR_BYTEZ_KEY_HERE");

const inputText = "What's the weather like in Seattle right now?";

const modelParams = {
  max_new_tokens: 2000,
  min_new_tokens: 50,
  temperature: 0.001,
  do_sample: false
};

const promptTemplate = `
Function:
def get_weather_data(coordinates):
    """
    Fetches weather data from the Open-Meteo API for the given latitude and longitude.

    Args:
    coordinates (tuple): The latitude and longitude of the location.

    Returns:
    float: The current temperature in the coordinates you've asked for
    """

Function:
def get_coordinates_from_city(city_name):
    """
    Fetches the latitude and longitude of a given city name using the Maps.co Geocoding API.

    Args:
    city_name (str): The name of the city.

    Returns:
    tuple: The latitude and longitude of the city.
    """

User Query: {query}<human_end>
`;

const model = client.model("Nexusflow/NexusRaven-V2-13B");

await model.load();

const prompt = promptTemplate.replace("{query}", inputText);

const stream = await model.run(prompt, { stream: true, params: modelParams });

const textStream = stream.pipeThrough(new TextDecoderStream());

for await (const chunk of textStream) {
  console.log(chunk);
}

Full documentation here

Streaming

Streaming allows you to receive model outputs incrementally as soon as they are available, which is ideal for tasks like real-time responses or large outputs.

How Streaming Works

To enable streaming, pass true as the third argument to the model.run() function. The model will return a stream that you can read incrementally.

const stream = await model.run(textInput, params, true);

Node.js

const { Readable } = require('stream');

const stream = await model.run(textInput, params, true);

try {
  const readableStream = Readable.fromWeb(stream); // Convert Web Stream to Node.js Readable Stream
  for await (const chunk of readableStream) {
    console.log(chunk.toString()); // Handle each chunk of data
  }
} catch (error) {
  console.error(error); // Handle errors
}

Browser

const stream = await model.run(textInput, params, true);

try {
  const reader = stream.getReader(); // Get a reader for the Web Stream

  while (true) {
    const { done, value } = await reader.read(); // Read the stream chunk-by-chunk
    if (done) break; // Exit when the stream ends
    console.log(new TextDecoder().decode(value)); // Convert Uint8Array to string
  }
} catch (error) {
  console.error(error); // Handle errors
}

All Model Tasks

Our API provides access to a wide range of pretrained models across 33 machine learning tasks, each tailored to specific applications like summarization, document question-answering, audio classification, and more.

Explore the full list of tasks here.

Proprietary Models

Our v2 endpoint supports interacting with proprietary models from Anthropic, Google, Cohere, OpenAI, and Mistral.

OpenAI

curl --location 'https://api.bytez.com/models/v2/openai/gpt-4o-mini' \
--header 'Authorization: Key YOUR_BYTEZ_KEY_HERE' \
--header 'Provider-Key: PROVIDER_KEY' \
--header 'Content-Type: application/json' \
--data '{
    "messages": [{"role": "user", "content": "Hello my name is Bob and I like to eat"}],
    "stream": false,
    "params": { "max_tokens": 100 }
}'

Google Gemini

curl --location 'https://api.bytez.com/models/v2/google/gemini-1.5-flash' \
--header 'Authorization: Key YOUR_BYTEZ_KEY_HERE' \
--header 'Provider-Key: PROVIDER_KEY' \
--header 'Content-Type: application/json' \
--data '{
    "messages": [{"role": "user", "content": "Hello my name is Bob and I like to eat"}],
    "stream": false,
    "params": { "temperature": 1 }
}'

Cohere

curl --location 'https://api.bytez.com/models/v2/cohere/command-r' \
--header 'Authorization: Key YOUR_BYTEZ_KEY_HERE' \
--header 'Provider-Key: PROVIDER_KEY' \
--header 'Content-Type: application/json' \
--data '{
    "messages": [{"role": "user", "content": "Cats and rabbits who reside in fancy little houses"}],
    "stream": false,
    "params": { "max_tokens": 50 }
}'

Mistral

curl --location 'https://api.bytez.com/models/v2/mistral/mistral-small-latest' \
--header 'Authorization: Key YOUR_BYTEZ_KEY_HERE' \
--header 'Provider-Key: PROVIDER_KEY' \
--header 'Content-Type: application/json' \
--data '{
    "messages": [{"role": "user", "content": "Cats and rabbits who reside in fancy little houses"}],
    "stream": false,
    "params": { "max_tokens": 50 }
}'

Anthropic

curl --location 'https://api.bytez.com/models/v2/anthropic/claude-3-haiku-20240307' \
--header 'Authorization: Key YOUR_BYTEZ_KEY_HERE' \
--header 'Provider-Key: PROVIDER_KEY' \
--header 'Content-Type: application/json' \
--data '{
    "messages": [{"role": "user", "content": "Cats and rabbits who reside in fancy little houses"}],
    "stream": false,
    "params": { "max_tokens": 50 }
}'

Pricing

Inference pricing for models is designed to be straightforward and predictable. Instead of relying on complex token-based pricing (which doesn't make sense for non-text-generation models), we calculate costs based on Inference Meter Price and Time to First Inference.

Formula

Pricing = Meter Price × Inference Time

Key Features

Instance-Based Pricing

Models run on instances optimized for RAM usage.
Instances are categorized by size (e.g., Micro, Small, Super).
LLMs (Large Language Models) have their own specific pricing meters.

Transparent API Response Metadata

Each API response includes:

Inference Meter
Inference Meter Price
Inference Time
Inference Cost

Prices

Language Models

Instance Size	GPU RAM (GB)	Inference Meter Price ($/sec)
Micro	16	0.0000872083
XS	24	0.0001475035
SM	64	0.0006478333
MD	96	0.0008433876
LG	128	0.0012956667
XL	192	0.0024468774
XXL	320	0.0047912685
Super	640	0.0059890856

All other models

Instance Size	GPU RAM (GB)	Inference Meter Price ($/sec)
Micro	16	0.00053440
XS	24	0.00066800
SM	64	0.00427520
MD	96	0.00480960
LG	128	0.00855040
XL	192	0.01603200
XXL	320	0.02458240
Super	640	0.02992640

API Playground

Explore our API endpoints in the documentation here.

Status

Check out the status of our API

Resources

Get to know our story, our mission, and our roadmap here.

Feedback

We’re committed to building the best developer experience for AI builders. Have feedback? Let us know on Discord or open an issue on GitHub.

Name		Name	Last commit message	Last commit date
Latest commit History 218 Commits
.github/workflows		.github/workflows
.vscode		.vscode
examples		examples
javascript		javascript
julia		julia
python		python
.gitignore		.gitignore
README.md		README.md
about.md		about.md
api.md		api.md
cold-boot-billing.md		cold-boot-billing.md

Bytez-com/docs

Folders and files

Latest commit

History

Repository files navigation

Bytez Model API

Table of Contents

Introduction

Quickstart

REST

API Documentation

Get a Bytez API Key

Send Your First API Request

Accessing Closed Source Models

Notes

List Models & Tasks

Python

Javascript

Julia

REST

Play with Trending Models on Bytez

Libraries

Python

Javascript

Julia

Run Inference

Python

JavaScript

Julia

Docker

Image Source Code

Capabilities

Chat

Javascript

Chat + Vision

Javascript

Chat + Video

Javascript

Chat + Audio

Javascript

Image Generation

Javascript

Embeddings

Function Calling

Streaming

How Streaming Works

Node.js

Browser

All Model Tasks

Proprietary Models

OpenAI

Google Gemini

Cohere

Mistral

Anthropic

Pricing

Formula

Key Features

Instance-Based Pricing

Transparent API Response Metadata

Prices

Language Models

All other models

API Playground

Status

Resources

Feedback

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages