Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MLX model support #300

Merged
merged 22 commits into from
Feb 12, 2025
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 31 additions & 7 deletions docs/source/en/guided_tour.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,14 +24,15 @@ In this guided visit, you will learn how to build an agent, how to run it, and h
To initialize a minimal agent, you need at least these two arguments:

- `model`, a text-generation model to power your agent - because the agent is different from a simple LLM, it is a system that uses a LLM as its engine. You can use any of these options:
- [`TransformersModel`] takes a pre-initialized `transformers` pipeline to run inference on your local machine using `transformers`.
- [`HfApiModel`] leverages a `huggingface_hub.InferenceClient` under the hood.
- [`LiteLLMModel`] lets you call 100+ different models through [LiteLLM](https://docs.litellm.ai/)!
- [`AzureOpenAIServerModel`] allows you to use OpenAI models deployed in [Azure](https://azure.microsoft.com/en-us/products/ai-services/openai-service).
- [`TransformersModel`] takes a pre-initialized `transformers` pipeline to run inference on your local machine using `transformers`.
- [`HfApiModel`] leverages a `huggingface_hub.InferenceClient` under the hood.
- [`LiteLLMModel`] lets you call 100+ different models through [LiteLLM](https://docs.litellm.ai/)!
- [`AzureOpenAIServerModel`] allows you to use OpenAI models deployed in [Azure](https://azure.microsoft.com/en-us/products/ai-services/openai-service).
- [`MLXModel`] takes a pre-initialized `mlx-lm` pipeline to run inference on your local machine using [mlx-lm](https://pypi.org/project/mlx-lm/).

- `tools`, a list of `Tools` that the agent can use to solve the task. It can be an empty list. You can also add the default toolbox on top of your `tools` list by defining the optional argument `add_base_tools=True`.

Once you have these two arguments, `tools` and `model`, you can create an agent and run it. You can use any LLM you'd like, either through [Hugging Face API](https://huggingface.co/docs/api-inference/en/index), [transformers](https://github.com/huggingface/transformers/), [ollama](https://ollama.com/), [LiteLLM](https://www.litellm.ai/), or [Azure OpenAI](https://azure.microsoft.com/en-us/products/ai-services/openai-service).
Once you have these two arguments, `tools` and `model`, you can create an agent and run it. You can use any LLM you'd like, either through [Hugging Face API](https://huggingface.co/docs/api-inference/en/index), [transformers](https://github.com/huggingface/transformers/), [ollama](https://ollama.com/), [LiteLLM](https://www.litellm.ai/), [Azure OpenAI](https://azure.microsoft.com/en-us/products/ai-services/openai-service), or [mlx-lm](https://pypi.org/project/mlx-lm/).

<hfoptions id="Pick a LLM">
<hfoption id="Hugging Face API">
Expand All @@ -52,6 +53,7 @@ agent.run(
"Could you give me the 118th number in the Fibonacci sequence?",
)
```

</hfoption>
<hfoption id="Local Transformers Model">

Expand All @@ -68,6 +70,7 @@ agent.run(
"Could you give me the 118th number in the Fibonacci sequence?",
)
```

</hfoption>
<hfoption id="OpenAI or Anthropic API">

Expand All @@ -84,6 +87,7 @@ agent.run(
"Could you give me the 118th number in the Fibonacci sequence?",
)
```

</hfoption>
<hfoption id="Ollama">

Expand All @@ -104,6 +108,7 @@ agent.run(
"Could you give me the 118th number in the Fibonacci sequence?",
)
```

</hfoption>
<hfoption id="Azure OpenAI">

Expand Down Expand Up @@ -147,6 +152,19 @@ agent.run(
)
```

</hfoption>
<hfoption id="MLX">

```python
# !pip install smolagents[mlx-lm]
from smolagents import CodeAgent, MLXModel

mlx_model = MLXModel("mlx-community/Qwen2.5-Coder-32B-Instruct-4bit")
agent = CodeAgent(model=mlx_model, tools=[], add_base_tools=True)

agent.run("Could you give me the 118th number in the Fibonacci sequence?")
```

</hfoption>
</hfoptions>

Expand Down Expand Up @@ -188,12 +206,14 @@ agent.run("Could you get me the title of the page at url 'https://huggingface.co
### Inspecting an agent run

Here are a few useful attributes to inspect what happened after a run:

- `agent.logs` stores the fine-grained logs of the agent. At every step of the agent's run, everything gets stored in a dictionary that then is appended to `agent.logs`.
- Running `agent.write_inner_memory_from_logs()` creates an inner memory of the agent's logs for the LLM to view, as a list of chat messages. This method goes over each step of the log and only stores what it's interested in as a message: for instance, it will save the system prompt and task in separate messages, then for each step it will store the LLM output as a message, and the tool call output as another message. Use this if you want a higher-level view of what has happened - but not every log will be transcripted by this method.

## Tools

A tool is an atomic function to be used by an agent. To be used by an LLM, it also needs a few attributes that constitute its API and will be used to describe to the LLM how to call this tool:

- A name
- A description
- Input types and descriptions
Expand Down Expand Up @@ -261,6 +281,7 @@ def model_download_tool(task: str) -> str:
```

The function needs:

- A clear name. The name should be descriptive enough of what this tool does to help the LLM brain powering the agent. Since this tool returns the model with the most downloads for a task, let's name it `model_download_tool`.
- Type hints on both inputs and output
- A description, that includes an 'Args:' part where each argument is described (without a type indication this time, it will be pulled from the type hint). Same as for the tool name, this description is an instruction manual for the LLM powering you agent, so do not neglect it.
Expand All @@ -286,16 +307,18 @@ class ModelDownloadTool(Tool):
```

The subclass needs the following attributes:

- A clear `name`. The name should be descriptive enough of what this tool does to help the LLM brain powering the agent. Since this tool returns the model with the most downloads for a task, let's name it `model_download_tool`.
- A `description`. Same as for the `name`, this description is an instruction manual for the LLM powering you agent, so do not neglect it.
- Input types and descriptions
- Output type
All these attributes will be automatically baked into the agent's system prompt upon initialization: so strive to make them as clear as possible!
</hfoption>
</hfoptions>

</hfoptions>

Then you can directly initialize your agent:

```py
from smolagents import CodeAgent, HfApiModel
agent = CodeAgent(tools=[model_download_tool], model=HfApiModel())
Expand All @@ -305,6 +328,7 @@ agent.run(
```

You get the following logs:

```text
╭──────────────────────────────────────── New run ─────────────────────────────────────────╮
│ │
Expand Down Expand Up @@ -370,7 +394,6 @@ manager_agent.run("Who is the CEO of Hugging Face?")
> [!TIP]
> For an in-depth example of an efficient multi-agent implementation, see [how we pushed our multi-agent system to the top of the GAIA leaderboard](https://huggingface.co/blog/beating-gaia).


## Talk with your agent and visualize its thoughts in a cool Gradio interface

You can use `GradioUI` to interactively submit tasks to your agent and observe its thought and execution process, here is an example:
Expand Down Expand Up @@ -402,6 +425,7 @@ You can also use this `reset=False` argument to keep the conversation going in a
## Next steps

For more in-depth usage, you will then want to check out our tutorials:

- [the explanation of how our code agents work](./tutorials/secure_code_execution)
- [this guide on how to build good agents](./tutorials/building_good_agents).
- [the in-depth guide for tool usage](./tutorials/building_good_agents).
149 changes: 149 additions & 0 deletions docs/source/en/reference/agents.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,3 +57,152 @@ Both require arguments `model` and list of tools `tools` at initialization.
> You must have `gradio` installed to use the UI. Please run `pip install smolagents[gradio]` if it's not the case.

[[autodoc]] GradioUI

## Models

You're free to create and use your own models to power your agent.

You could use any `model` callable for your agent, as long as:
1. It follows the [messages format](./chat_templating) (`List[Dict[str, str]]`) for its input `messages`, and it returns a `str`.
2. It stops generating outputs *before* the sequences passed in the argument `stop_sequences`

For defining your LLM, you can make a `custom_model` method which accepts a list of [messages](./chat_templating) and returns an object with a .content attribute containing the text. This callable also needs to accept a `stop_sequences` argument that indicates when to stop generating.

```python
from huggingface_hub import login, InferenceClient

login("<YOUR_HUGGINGFACEHUB_API_TOKEN>")

model_id = "meta-llama/Llama-3.3-70B-Instruct"

client = InferenceClient(model=model_id)

def custom_model(messages, stop_sequences=["Task"]):
response = client.chat_completion(messages, stop=stop_sequences, max_tokens=1000)
answer = response.choices[0].message
return answer
```

Additionally, `custom_model` can also take a `grammar` argument. In the case where you specify a `grammar` upon agent initialization, this argument will be passed to the calls to model, with the `grammar` that you defined upon initialization, to allow [constrained generation](https://huggingface.co/docs/text-generation-inference/conceptual/guidance) in order to force properly-formatted agent outputs.

### TransformersModel

For convenience, we have added a `TransformersModel` that implements the points above by building a local `transformers` pipeline for the model_id given at initialization.

```python
from smolagents import TransformersModel

model = TransformersModel(model_id="HuggingFaceTB/SmolLM-135M-Instruct")

print(model([{"role": "user", "content": "Ok!"}], stop_sequences=["great"]))
```
```text
>>> What a
```

> [!TIP]
> You must have `transformers` and `torch` installed on your machine. Please run `pip install smolagents[transformers]` if it's not the case.

[[autodoc]] TransformersModel

### HfApiModel

The `HfApiModel` wraps an [HF Inference API](https://huggingface.co/docs/api-inference/index) client for the execution of the LLM.

```python
from smolagents import HfApiModel

messages = [
{"role": "user", "content": "Hello, how are you?"},
{"role": "assistant", "content": "I'm doing great. How can I help you today?"},
{"role": "user", "content": "No need to help, take it easy."},
]

model = HfApiModel()
print(model(messages))
```
```text
>>> Of course! If you change your mind, feel free to reach out. Take care!
```
[[autodoc]] HfApiModel

### LiteLLMModel

The `LiteLLMModel` leverages [LiteLLM](https://www.litellm.ai/) to support 100+ LLMs from various providers.
You can pass kwargs upon model initialization that will then be used whenever using the model, for instance below we pass `temperature`.

```python
from smolagents import LiteLLMModel

messages = [
{"role": "user", "content": "Hello, how are you?"},
{"role": "assistant", "content": "I'm doing great. How can I help you today?"},
{"role": "user", "content": "No need to help, take it easy."},
]

model = LiteLLMModel("anthropic/claude-3-5-sonnet-latest", temperature=0.2, max_tokens=10)
print(model(messages))
```

[[autodoc]] LiteLLMModel

### OpenAIServerModel

This class lets you call any OpenAIServer compatible model.
Here's how you can set it (you can customise the `api_base` url to point to another server):
```py
from smolagents import OpenAIServerModel

model = OpenAIServerModel(
model_id="gpt-4o",
api_base="https://api.openai.com/v1",
api_key=os.environ["OPENAI_API_KEY"],
)
```

[[autodoc]] OpenAIServerModel

### AzureOpenAIServerModel

`AzureOpenAIServerModel` allows you to connect to any Azure OpenAI deployment.

Below you can find an example of how to set it up, note that you can omit the `azure_endpoint`, `api_key`, and `api_version` arguments, provided you've set the corresponding environment variables -- `AZURE_OPENAI_ENDPOINT`, `AZURE_OPENAI_API_KEY`, and `OPENAI_API_VERSION`.

Pay attention to the lack of an `AZURE_` prefix for `OPENAI_API_VERSION`, this is due to the way the underlying [openai](https://github.com/openai/openai-python) package is designed.

```py
import os

from smolagents import AzureOpenAIServerModel

model = AzureOpenAIServerModel(
model_id = os.environ.get("AZURE_OPENAI_MODEL"),
azure_endpoint=os.environ.get("AZURE_OPENAI_ENDPOINT"),
api_key=os.environ.get("AZURE_OPENAI_API_KEY"),
api_version=os.environ.get("OPENAI_API_VERSION")
)
```

[[autodoc]] AzureOpenAIServerModel

### MLXModel

For convenience, we have added a `MLXModel` that implements the points above by building a local `mlx-lm` pipeline for the model_id given at initialization.

```python
from smolagents import MLXModel

model = MLXModel(model_id="HuggingFaceTB/SmolLM-135M-Instruct")

print(model([{"role": "user", "content": "Ok!"}], stop_sequences=["great"]))
```
```text
>>> What a
```

> [!TIP]
> You must have `mlx-lm` installed on your machine. Please run `pip install smolagents[mlx-lm]` if it's not the case.

[[autodoc]] MLXModel

=======
3 changes: 3 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,9 @@ mcp = [
"mcpadapt>=0.0.6",
"mcp",
]
mlx-lm = [
"mlx-lm"
]
openai = [
"openai>=1.58.1"
]
Expand Down
Loading