huggingface · aymeric-roucher · Feb 12, 2025 · Jan 21, 2025 · Jan 21, 2025 · Jan 22, 2025
diff --git a/docs/source/en/guided_tour.md b/docs/source/en/guided_tour.md
@@ -24,14 +24,15 @@ In this guided visit, you will learn how to build an agent, how to run it, and h
 To initialize a minimal agent, you need at least these two arguments:
 
 - `model`, a text-generation model to power your agent - because the agent is different from a simple LLM, it is a system that uses a LLM as its engine. You can use any of these options:
-    - [`TransformersModel`] takes a pre-initialized `transformers` pipeline to run inference on your local machine using `transformers`.
-    - [`HfApiModel`] leverages a `huggingface_hub.InferenceClient` under the hood.
-    - [`LiteLLMModel`] lets you call 100+ different models through [LiteLLM](https://docs.litellm.ai/)!
-    - [`AzureOpenAIServerModel`] allows you to use OpenAI models deployed in [Azure](https://azure.microsoft.com/en-us/products/ai-services/openai-service).
+  - [`TransformersModel`] takes a pre-initialized `transformers` pipeline to run inference on your local machine using `transformers`.
+  - [`HfApiModel`] leverages a `huggingface_hub.InferenceClient` under the hood.
+  - [`LiteLLMModel`] lets you call 100+ different models through [LiteLLM](https://docs.litellm.ai/)!
+  - [`AzureOpenAIServerModel`] allows you to use OpenAI models deployed in [Azure](https://azure.microsoft.com/en-us/products/ai-services/openai-service).
+  - [`MLXModel`] takes a pre-initialized `mlx-lm` pipeline to run inference on your local machine using [mlx-lm](https://pypi.org/project/mlx-lm/).
 
 - `tools`, a list of `Tools` that the agent can use to solve the task. It can be an empty list. You can also add the default toolbox on top of your `tools` list by defining the optional argument `add_base_tools=True`.
 
-Once you have these two arguments, `tools` and `model`,  you can create an agent and run it. You can use any LLM you'd like, either through [Hugging Face API](https://huggingface.co/docs/api-inference/en/index), [transformers](https://github.com/huggingface/transformers/), [ollama](https://ollama.com/), [LiteLLM](https://www.litellm.ai/), or [Azure OpenAI](https://azure.microsoft.com/en-us/products/ai-services/openai-service).
+Once you have these two arguments, `tools` and `model`,  you can create an agent and run it. You can use any LLM you'd like, either through [Hugging Face API](https://huggingface.co/docs/api-inference/en/index), [transformers](https://github.com/huggingface/transformers/), [ollama](https://ollama.com/), [LiteLLM](https://www.litellm.ai/), [Azure OpenAI](https://azure.microsoft.com/en-us/products/ai-services/openai-service), or [mlx-lm](https://pypi.org/project/mlx-lm/).
 
 <hfoptions id="Pick a LLM">
 <hfoption id="Hugging Face API">
@@ -52,6 +53,7 @@ agent.run(
     "Could you give me the 118th number in the Fibonacci sequence?",
 )
 ```
+
 </hfoption>
 <hfoption id="Local Transformers Model">
 
@@ -68,6 +70,7 @@ agent.run(
     "Could you give me the 118th number in the Fibonacci sequence?",
 )
 ```
+
 </hfoption>
 <hfoption id="OpenAI or Anthropic API">
 
@@ -84,6 +87,7 @@ agent.run(
     "Could you give me the 118th number in the Fibonacci sequence?",
 )
 ```
+
 </hfoption>
 <hfoption id="Ollama">
 
@@ -104,6 +108,7 @@ agent.run(
     "Could you give me the 118th number in the Fibonacci sequence?",
 )
 ```
+
 </hfoption>
 <hfoption id="Azure OpenAI">
 
@@ -147,6 +152,19 @@ agent.run(
 )
 ```
 
+</hfoption>
+<hfoption id="MLX">
+
+```python
+# !pip install smolagents[mlx-lm]
+from smolagents import CodeAgent, MLXModel
+
+mlx_model = MLXModel("mlx-community/Qwen2.5-Coder-32B-Instruct-4bit")
+agent = CodeAgent(model=mlx_model, tools=[], add_base_tools=True)
+
+agent.run("Could you give me the 118th number in the Fibonacci sequence?")
+```
+
 </hfoption>
 </hfoptions>
 
@@ -188,12 +206,14 @@ agent.run("Could you get me the title of the page at url 'https://huggingface.co
 ### Inspecting an agent run
 
 Here are a few useful attributes to inspect what happened after a run:
+
 - `agent.logs` stores the fine-grained logs of the agent. At every step of the agent's run, everything gets stored in a dictionary that then is appended to `agent.logs`.
 - Running `agent.write_inner_memory_from_logs()` creates an inner memory of the agent's logs for the LLM to view, as a list of chat messages. This method goes over each step of the log and only stores what it's interested in as a message: for instance, it will save the system prompt and task in separate messages, then for each step it will store the LLM output as a message, and the tool call output as another message. Use this if you want a higher-level view of what has happened - but not every log will be transcripted by this method.
 
 ## Tools
 
 A tool is an atomic function to be used by an agent. To be used by an LLM, it also needs a few attributes that constitute its API and will be used to describe to the LLM how to call this tool:
+
 - A name
 - A description
 - Input types and descriptions
@@ -261,6 +281,7 @@ def model_download_tool(task: str) -> str:
 ```
 
 The function needs:
+
 - A clear name. The name should be descriptive enough of what this tool does to help the LLM brain powering the agent. Since this tool returns the model with the most downloads for a task, let's name it `model_download_tool`.
 - Type hints on both inputs and output
 - A description, that includes an 'Args:' part where each argument is described (without a type indication this time, it will be pulled from the type hint). Same as for the tool name, this description is an instruction manual for the LLM powering you agent, so do not neglect it.
@@ -286,16 +307,18 @@ class ModelDownloadTool(Tool):
 ```
 
 The subclass needs the following attributes:
+
 - A clear `name`. The name should be descriptive enough of what this tool does to help the LLM brain powering the agent. Since this tool returns the model with the most downloads for a task, let's name it `model_download_tool`.
 - A `description`. Same as for the `name`, this description is an instruction manual for the LLM powering you agent, so do not neglect it.
 - Input types and descriptions
 - Output type
 All these attributes will be automatically baked into the agent's system prompt upon initialization: so strive to make them as clear as possible!
 </hfoption>
-</hfoptions>
 
+</hfoptions>
 
 Then you can directly initialize your agent:
+
 ```py
 from smolagents import CodeAgent, HfApiModel
 agent = CodeAgent(tools=[model_download_tool], model=HfApiModel())
@@ -305,6 +328,7 @@ agent.run(
 ```
 
 You get the following logs:
+
 ```text
 ╭──────────────────────────────────────── New run ─────────────────────────────────────────╮
 │                                                                                          │
@@ -370,7 +394,6 @@ manager_agent.run("Who is the CEO of Hugging Face?")
 > [!TIP]
 > For an in-depth example of an efficient multi-agent implementation, see [how we pushed our multi-agent system to the top of the GAIA leaderboard](https://huggingface.co/blog/beating-gaia).
 
-
 ## Talk with your agent and visualize its thoughts in a cool Gradio interface
 
 You can use `GradioUI` to interactively submit tasks to your agent and observe its thought and execution process, here is an example:
@@ -402,6 +425,7 @@ You can also use this `reset=False` argument to keep the conversation going in a
 ## Next steps
 
 For more in-depth usage, you will then want to check out our tutorials:
+
 - [the explanation of how our code agents work](./tutorials/secure_code_execution)
 - [this guide on how to build good agents](./tutorials/building_good_agents).
 - [the in-depth guide for tool usage](./tutorials/building_good_agents).
diff --git a/docs/source/en/reference/agents.md b/docs/source/en/reference/agents.md
@@ -57,3 +57,152 @@ Both require arguments `model` and list of tools `tools` at initialization.
 > You must have `gradio` installed to use the UI. Please run `pip install smolagents[gradio]` if it's not the case.
 
 [[autodoc]] GradioUI
+
+## Models
+
+You're free to create and use your own models to power your agent.
+
+You could use any `model` callable for your agent, as long as:
+1. It follows the [messages format](./chat_templating) (`List[Dict[str, str]]`) for its input `messages`, and it returns a `str`.
+2. It stops generating outputs *before* the sequences passed in the argument `stop_sequences`
+
+For defining your LLM, you can make a `custom_model` method which accepts a list of [messages](./chat_templating) and returns an object with a .content attribute containing the text. This callable also needs to accept a `stop_sequences` argument that indicates when to stop generating.
+
+```python
+from huggingface_hub import login, InferenceClient
+
+login("<YOUR_HUGGINGFACEHUB_API_TOKEN>")
+
+model_id = "meta-llama/Llama-3.3-70B-Instruct"
+
+client = InferenceClient(model=model_id)
+
+def custom_model(messages, stop_sequences=["Task"]):
+    response = client.chat_completion(messages, stop=stop_sequences, max_tokens=1000)
+    answer = response.choices[0].message
+    return answer
+```
+
+Additionally, `custom_model` can also take a `grammar` argument. In the case where you specify a `grammar` upon agent initialization, this argument will be passed to the calls to model, with the `grammar` that you defined upon initialization, to allow [constrained generation](https://huggingface.co/docs/text-generation-inference/conceptual/guidance) in order to force properly-formatted agent outputs.
+
+### TransformersModel
+
+For convenience, we have added a `TransformersModel` that implements the points above by building a local `transformers` pipeline for the model_id given at initialization.
+
+```python
+from smolagents import TransformersModel
+
+model = TransformersModel(model_id="HuggingFaceTB/SmolLM-135M-Instruct")
+
+print(model([{"role": "user", "content": "Ok!"}], stop_sequences=["great"]))
+```
+```text
+>>> What a
+```
+
+> [!TIP]
+> You must have `transformers` and `torch` installed on your machine. Please run `pip install smolagents[transformers]` if it's not the case.
+
+[[autodoc]] TransformersModel
+
+### HfApiModel
+
+The `HfApiModel` wraps an [HF Inference API](https://huggingface.co/docs/api-inference/index) client for the execution of the LLM.
+
+```python
+from smolagents import HfApiModel
+
+messages = [
+  {"role": "user", "content": "Hello, how are you?"},
+  {"role": "assistant", "content": "I'm doing great. How can I help you today?"},
+  {"role": "user", "content": "No need to help, take it easy."},
+]
+
+model = HfApiModel()
+print(model(messages))
+```
+```text
+>>> Of course! If you change your mind, feel free to reach out. Take care!
+```
+[[autodoc]] HfApiModel
+
+### LiteLLMModel
+
+The `LiteLLMModel` leverages [LiteLLM](https://www.litellm.ai/) to support 100+ LLMs from various providers.
+You can pass kwargs upon model initialization that will then be used whenever using the model, for instance below we pass `temperature`.
+
+```python
+from smolagents import LiteLLMModel
+
+messages = [
+  {"role": "user", "content": "Hello, how are you?"},
+  {"role": "assistant", "content": "I'm doing great. How can I help you today?"},
+  {"role": "user", "content": "No need to help, take it easy."},
+]
+
+model = LiteLLMModel("anthropic/claude-3-5-sonnet-latest", temperature=0.2, max_tokens=10)
+print(model(messages))
+```
+
+[[autodoc]] LiteLLMModel
+
+### OpenAIServerModel
+
+This class lets you call any OpenAIServer compatible model.
+Here's how you can set it (you can customise the `api_base` url to point to another server):
+```py
+from smolagents import OpenAIServerModel
+
+model = OpenAIServerModel(
+    model_id="gpt-4o",
+    api_base="https://api.openai.com/v1",
+    api_key=os.environ["OPENAI_API_KEY"],
+)
+```
+
+[[autodoc]] OpenAIServerModel
+
+### AzureOpenAIServerModel
+
+`AzureOpenAIServerModel` allows you to connect to any Azure OpenAI deployment. 
+
+Below you can find an example of how to set it up, note that you can omit the `azure_endpoint`, `api_key`, and `api_version` arguments, provided you've set the corresponding environment variables -- `AZURE_OPENAI_ENDPOINT`, `AZURE_OPENAI_API_KEY`, and `OPENAI_API_VERSION`.
+
+Pay attention to the lack of an `AZURE_` prefix for `OPENAI_API_VERSION`, this is due to the way the underlying [openai](https://github.com/openai/openai-python) package is designed. 
+
+```py
+import os
+
+from smolagents import AzureOpenAIServerModel
+
+model = AzureOpenAIServerModel(
+    model_id = os.environ.get("AZURE_OPENAI_MODEL"),
+    azure_endpoint=os.environ.get("AZURE_OPENAI_ENDPOINT"),
+    api_key=os.environ.get("AZURE_OPENAI_API_KEY"),
+    api_version=os.environ.get("OPENAI_API_VERSION")    
+)
+```
+
+[[autodoc]] AzureOpenAIServerModel
+
+### MLXModel
+
+For convenience, we have added a `MLXModel` that implements the points above by building a local `mlx-lm` pipeline for the model_id given at initialization.
+
+```python
+from smolagents import MLXModel
+
+model = MLXModel(model_id="HuggingFaceTB/SmolLM-135M-Instruct")
+
+print(model([{"role": "user", "content": "Ok!"}], stop_sequences=["great"]))
+```
+```text
+>>> What a
+```
+
+> [!TIP]
+> You must have `mlx-lm` installed on your machine. Please run `pip install smolagents[mlx-lm]` if it's not the case.
+
+[[autodoc]] MLXModel
+
+=======
diff --git a/pyproject.toml b/pyproject.toml
@@ -50,6 +50,9 @@ mcp = [
   "mcpadapt>=0.0.6",
   "mcp",
 ]
+mlx-lm = [
+  "mlx-lm"
+]
 openai = [
   "openai>=1.58.1"
 ]