gradion-ai · cstub · Jan 22, 2025 · Jan 23, 2025 · Jan 24, 2025
diff --git a/docs/guided_tour.md b/docs/guided_tour.md
@@ -0,0 +1,253 @@
+# Guided tour
+
+In this guided tour, you'll learn how to create **code action agents** using `freeact`.
+
+We'll explore how to leverage different language models, execute code securely, enhance agent capabilities through skills, and guide agent behavior using natural language instructions.
+
+## Setup
+
+Before you start, install the `freeact` package:
+
+```bash
+pip install freeact
+```
+
+This tour requires a `GOOGLE_API_KEY` to perform generative web searches using the Gemini 2 API. Get your key from the [Google AI Studio](https://aistudio.google.com/app/apikey) and create a `.env` file with the following environment variable:
+
+```bash title=".env"
+GEMINI_API_KEY=<your-api-key>
+```
+
+## Building an agent
+
+Let's explore the essential components you'll need to build a `freeact` agent:
+
+* [`CodeActModel`][freeact.model.CodeActModel]: a language model that generates Python code or provides final answers upon task completion. It acts as the decision-making engine of the agent.
+
+* [`CodeActAgent`][freeact.agent.CodeActAgent]: implements the agent loop that coordinates interactions between the model and the execution environment. It manages code flow and user conversations.
+
+* [`execution_environment`][freeact.executor.execution_environment]: an execution environment that runs code in an isolated sandbox container.
+
+To create an agent, set up the execution environment, select a model from the [supported models](models.md), and initialize the agent with your chosen model and the environment's code executor. 
+
+The examples below show how to do this using [`Claude`][freeact.model.claude.model.Claude], [`Gemini`][freeact.model.gemini.model.chat.Gemini], and a local [`QwenCoder`][freeact.model.qwen.model.QwenCoder] (deployed via [ollama](https://ollama.com)). Each example runs the agent for a single turn (a single interaction with the user) and uses the `stream_turn` helper function to handle intermediate and final outputs. 
+
+=== "Claude"
+    Create an `ANTHROPIC_API_KEY` using the [Anthropic Console](https://console.anthropic.com/settings/keys) and add it to your `.env` file:
+
+    ```bash title=".env"
+    ANTHROPIC_API_KEY=<your-api-key>    
+    ```
+
+    ```python
+    --8<-- "freeact/examples/guided_tour/basic_claude.py"
+    ```
+
+=== "Gemini"
+    ```python
+    --8<-- "freeact/examples/guided_tour/basic_gemini.py"
+    ```
+
+=== "Local"
+    Deploy a local Qwen 2.5 Coder model using [ollama](https://ollama.com):
+
+    ```bash    
+    ollama run qwen2.5-coder:32b-instruct-q8_0
+    ```
+
+    ```python
+    --8<-- "freeact/examples/guided_tour/basic_local_qwen.py"
+    ```
+
+    !!! Note
+
+        Support for Qwen models is still experimental. Larger Qwen 2.5 Coder models work reasonably well, but smaller models may require optimization of prompt templates.
+
+The `stream_turn` helper function uses `freeact`'s [streaming protocol](streaming.md) to handle intermediate and final outputs of the agent. This happens in a sequence of:
+
+* **model turns** ([`CodeActModelTurn`][freeact.model.CodeActModelTurn]) containing the model response and an optional code action and
+* **code executions** ([`CodeExecution`][freeact.agent.CodeExecution]) containing the result from running a code action in the execution environment.
+
+ This sequence continues until the model provides a final response.
+
+```python
+--8<-- "freeact/examples/utils.py:stream_turn_minimal"
+```
+
+1. Process the agent workflow steps alternating between [model turns][freeact.model.CodeActModelTurn] and [code executions][freeact.agent.CodeExecution]. The sequence continues until the model provides a final response.
+2. A single model turn that [streams][freeact.model.CodeActModelTurn.stream] the text output and returns the generated code in the [response][freeact.model.CodeActModelTurn.response].
+3. The model [response][freeact.model.CodeActModelResponse] including any optional code.
+4. The [code execution][freeact.agent.CodeExecution] from the execution environment. The result can be [streamed][freeact.agent.CodeExecution.stream] and retrieved with the [result][freeact.agent.CodeExecution.result] method.
+
+To engage in a multi-turn conversation with the agent, you can instead use the `stream_conversation` helper function.
+
+??? "stream_conversation"
+    ```python title="freeact/examples/utils.py::stream_conversation"
+    --8<-- "freeact/examples/utils.py:stream_conversation"
+    --8<-- "freeact/examples/utils.py:stream_turn"
+    ```
+
+## Secure code execution
+
+Code actions generated by an agent are executed in a secure, containerized environment powered by [`ipybox`](https://github.com/gradion-ai/ipybox) - a specialized runtime built on [IPython](https://ipython.org) and [Docker](https://www.docker.com). The [`execution_environment`][freeact.executor.execution_environment] context manager provides a simple way to set up and manage this environment. It can be configured with the following parameters:
+
+* `tag`: specifies the Docker image to use as your execution environment. We use the pre-built `ghcr.io/gradion-ai/ipybox:example` image for all examples.
+* `env`: allows you to pass environment variables like API keys to the execution environment.
+* `workspace_path`: the workspace that is used to share files between your local machine and the execution environment. This directory is used by the agent to store custom skills, as well as generated images and other files.
+* `executor_key`: a unique identifier for each agent within the execution environment. Private directories matching this key are created to store custom skills and output files only accessible to this agent.
+
+## Enhancing agents with skills
+
+In `freeact`, agents gain their capabilities through skills - **reusable Python modules that implement specific functionality**. Each skill module contains code that can be used by the agent to perform tasks like searching the web, analyzing data, or interacting with external services. 
+You can use predefined skills from the [freeact-skills](https://github.com/gradion-ai/freeact-skills) library, create custom skills yourself, or develop them collaboratively through [interactive sessions with an agent](tutorials/skills.md).
+
+Skill modules need to be available to the Docker container running the execution environment. This can be done using one of the following methods:
+
+* Pre-installing packages: [installing Python packages](installation.md#execution-environment) containing skills in the execution environment, for example the [freeact-skills](https://github.com/gradion-ai/freeact-skills) library or your own custom packages.
+* Workspace directories: adding Python modules containing skills to the `workspace/skills/shared` directory to make them available to all agents.
+
+Once a skill is available in the execution environment, you can load its sources from the environment and provide them to the agent as input.
+
+### Pre-installed skills
+
+To use a pre-installed skill, you need to specify the **full path** of the Python modules containing the skill sources and provide them to the execution environment.
+The following example shows how to do this using the [Google search](https://github.com/gradion-ai/freeact-skills/blob/main/freeact_skills/search/google/stream/api.py) skill, which comes pre-installed in our `ghcr.io/gradion-ai/ipybox:example` container.
+
+=== "Claude"
+
+    ```python
+    --8<-- "freeact/examples/guided_tour/skills_predefined_claude.py"
+    ```
+
+    1. Specify the full path of the Python modules containing the skills
+    2. Provide the loaded skill sources to the agent
+
+=== "Gemini"
+
+    ```python
+    --8<-- "freeact/examples/guided_tour/skills_predefined_gemini.py"
+    ```
+
+    1. Specify the full path of the Python modules containing the skills
+    2. Provide the loaded skill sources to the model
+
+=== "Local"
+
+    ```python
+    --8<-- "freeact/examples/guided_tour/skills_predefined_local_qwen.py"
+    ```
+
+    1. Specify the full path of the Python modules containing the skills
+    2. Provide the loaded skill sources to the model
+
+
+!!! Note
+
+    Currently, skill sources are passed to the *agent* for `Claude` models, but to the *model* itself for `Gemini` and others. This difference will be standardized in a future update, with all skill sources being passed directly to the agent.
+
+### Workspace skills
+
+To use a skill from the workspace, you need to add the skill modules to the `workspace/skills/shared` directory or develop skills in an [interactive session with an agent](tutorials/skills.md) and let the agent store them in the workspace. 
+Load the skill by providing the **relative path** to the skill module within the workspace.
+
+In the following example we will use a custom [weather report skill](workspace/skills/private/guided_tour/weather/weather_report.py), created in the [skill development tutorial](tutorials/skills.md) and extended to report cloud coverage for our use case. 
+
+We will add the skill module to the `workspace/skills/shared/` directory and load the skill by providing the relative path `weather.weather_report` to the skill module to the execution environment.
+
+```bash
+# Create the skill directory in the workspace
+mkdir -p workspace/skills/shared/weather &&
+# Download the skill module
+curl -o workspace/skills/shared/weather/weather_report.py https://raw.githubusercontent.com/gradion-ai/freeact/refs/heads/main/docs/workspace/skills/private/guided_tour/weather/weather_report.py
+```
+
+!!! Note
+    The current approach of manually copying skill modules to the workspace directory is a temporary solution. We plan to implement a more sophisticated skill management mechanism in future releases. 
+
+=== "Claude"
+
+    ```python
+    --8<-- "freeact/examples/guided_tour/skills_workspace_claude.py"
+    ```
+
+    1. Specify the relative path to the skill module in `workspace/skills/shared/` directory.
+
+=== "Gemini"
+
+    ```python
+    --8<-- "freeact/examples/guided_tour/skills_workspace_gemini.py"
+    ```
+
+    1. Specify the relative path to the skill module in `workspace/skills/shared/` directory.
+
+
+=== "Local"
+
+    ```python
+    --8<-- "freeact/examples/guided_tour/skills_workspace_local_qwen.py"
+    ```
+
+    1. Specify the relative path to the skill module in `workspace/skills/shared/` directory.
+
+!!! Note
+    If you've developed a custom skill that has external dependencies, you either need to build a [custom Docker image](../installation.md#custom-docker-image) with the required dependencies or need to [install them at runtime](../installation.md#installing-dependencies-at-runtime) prior to launching an agent.
+
+### Understanding skill structure
+
+Let's take a closer look at how skills are implemented in `freeact`. Skills are Python modules that can be structured as a single module, multiple modules, or a complete package. 
+
+While all modules of a skill must be available to the execution environment, you can select which specific parts of the skill to include in the model prompt for the agent. This design lets you control whether the agent sees the complete skill implementation or just its public interface.
+
+The following example shows the [Google search](https://github.com/gradion-ai/freeact-skills/tree/main/freeact_skills/search/google) skill. This skill has a public interface that is passed to the agent through the `freeact_skills.search.google.api` module, while its implementation remains in a separate module that only the execution environment can access.
+
+```python
+class InternetSearch:
+    """Search for up-to-date information on the internet"""
+    def __init__(self, api_key: str | None = None):
+        self.api_key = api_key
+
+    def search(self, natural_language_query: str) -> str:
+        """
+        Search for up-to-date information on the internet and return result in markdown format.
+
+        Args:
+            natural_language_query (str): A query string that matches a specific topic, concept, or fact.
+              It should be formulated in natural language and be as specific as possible.
+        """
+        from freeact_skills.search.google import impl
+
+        return impl.search(natural_language_query, self.api_key)
+```
+
+When developing skills, it is important to provide clear function names and comprehensive docstrings, as they help the agent understand exactly when and how to use the skill effectively.
+
+For more examples of skill implementations, see the [freeact-skills](https://github.com/gradion-ai/freeact-skills/tree/main/freeact_skills) library.
+
+## Customizing agent behavior
+
+System extensions allow you to customize your agent's behavior by providing additional rules, constraints, and workflows through natural language instructions. 
+These extensions add capabilities like human-in-the-loop processes, domain-specific knowledge or agent runbooks, helping the agent adapt to specific use cases.
+
+In the following example, you will change the agent's behavior to:
+
+* report all temperatures in Kelvin (as a domain-specific rule),
+* report cloud coverage in low, medium, or high (as a domain-specific rule),
+* suggest 3 follow-up actions after each response (as an overall workflow instruction).
+
+
+```python
+--8<-- "freeact/examples/guided_tour/system_extension.py"
+```
+
+1. Provide the system extension to the model
+
+!!! Note
+    System extensions are currently only supported for [Claude][freeact.model.claude.model.Claude] models.
+
+## Next steps
+
+For more in-depth usage, check out the following resources:
+
+* [Building blocks](blocks.md)
+* [Tutorials](tutorials/index.md)
+* [Supported models](models.md)
diff --git a/docs/index.md b/docs/index.md
@@ -20,6 +20,7 @@ The library's architecture emphasizes extensibility and transparency, avoiding t
 
 - [Quickstart](quickstart.md) - Launch your first `freeact` agent and interact with it on the command line
 - [Installation](installation.md) - Installation instructions and configuration of execution environments
+- [Guided tour](guided_tour.md) - A guided tour of the `freeact` library
 - [Building blocks](blocks.md) - Learn about the essential components of a `freeact` agent system
 - [Tutorials](tutorials/index.md) - Tutorials demonstrating the `freeact` building blocks
 

diff --git a/docs/workspace/skills/private/guided_tour/weather/weather_report.py b/docs/workspace/skills/private/guided_tour/weather/weather_report.py
@@ -0,0 +1,81 @@
+"""Module for getting weather reports for cities."""
+
+from datetime import datetime, timedelta
+from typing import Any, Dict
+
+import requests
+
+
+def get_weather_report(city_name: str, n_days: int = 7) -> Dict[str, Any]:
+    """Get current and historical weather report for a given city.
+
+    Args:
+        city_name: Name of the city to get weather for
+        n_days: Number of past days to get historical data for (excluding current day)
+
+    Returns:
+        Dictionary containing:
+        - temperature: Current temperature in Celsius
+        - humidity: Current relative humidity percentage
+        - cloud_cover: Current cloud coverage percentage
+        - measurement_time: Timestamp of current measurement
+        - coordinates: Dict with latitude and longitude
+        - city: City name used for query
+        - history: List of daily measurements for past n_days (excluding current day), each containing:
+            - date: Date of measurement
+            - temperature: Average daily temperature in Celsius
+            - humidity: Average daily relative humidity percentage
+            - cloud_cover: Average daily cloud coverage percentage
+    """
+    # First get coordinates using geocoding API
+    geocoding_url = f"https://geocoding-api.open-meteo.com/v1/search?name={city_name}&count=1&language=en&format=json"
+    geo_response = requests.get(geocoding_url)
+    geo_data = geo_response.json()
+
+    if not geo_data.get("results"):
+        raise ValueError(f"Could not find coordinates for city: {city_name}")
+
+    location = geo_data["results"][0]
+    lat = location["latitude"]
+    lon = location["longitude"]
+
+    # Calculate date range for historical data
+    end_date = datetime.now().date() - timedelta(days=1)  # yesterday
+    start_date = end_date - timedelta(days=n_days - 1)
+
+    # Get current and historical weather data using coordinates
+    weather_url = (
+        f"https://api.open-meteo.com/v1/forecast?"
+        f"latitude={lat}&longitude={lon}"
+        f"&current=temperature_2m,relative_humidity_2m,cloud_cover"
+        f"&daily=temperature_2m_mean,relative_humidity_2m_mean,cloud_cover_mean"
+        f"&timezone=auto"
+        f"&start_date={start_date}&end_date={end_date}"
+    )
+    weather_response = requests.get(weather_url)
+    weather_data = weather_response.json()
+
+    current = weather_data["current"]
+    daily = weather_data["daily"]
+
+    # Process historical data
+    history = []
+    for i in range(len(daily["time"])):
+        history.append(
+            {
+                "date": datetime.fromisoformat(daily["time"][i]).date(),
+                "temperature": daily["temperature_2m_mean"][i],
+                "humidity": daily["relative_humidity_2m_mean"][i],
+                "cloud_cover": daily["cloud_cover_mean"][i],
+            }
+        )
+
+    return {
+        "temperature": current["temperature_2m"],
+        "humidity": current["relative_humidity_2m"],
+        "cloud_cover": current["cloud_cover"],
+        "measurement_time": datetime.fromisoformat(current["time"]),
+        "coordinates": {"latitude": lat, "longitude": lon},
+        "city": location["name"],
+        "history": history,
+    }
diff --git a/evaluation/README.md b/evaluation/README.md
@@ -24,14 +24,10 @@ When comparing our results with smolagents using `claude-3-5-sonnet-20241022` on
 
 [<img src="../docs/eval/eval-plot-comparison.png" alt="Performance comparison" width="60%">](../docs/eval/eval-plot-comparison.png)
 
-| agent      | model                      | prompt    | subset   | %correct |
-|:-----------|:---------------------------|:----------|:---------|----------:|
-| freeact    | claude-3-5-sonnet-20241022 | zero-shot | GAIA     |  **53.1** |
-| freeact    | claude-3-5-sonnet-20241022 | zero-shot | GSM8K    |  **95.7** |
-| freeact    | claude-3-5-sonnet-20241022 | zero-shot | SimpleQA |  **57.5** |
-| smolagents | claude-3-5-sonnet-20241022 | few-shot  | GAIA     |      43.8 |
-| smolagents | claude-3-5-sonnet-20241022 | few-shot  | GSM8K    |      91.4 |
-| smolagents | claude-3-5-sonnet-20241022 | few-shot  | SimpleQA |      47.5 |
+| agent      | model                      | prompt    | GAIA      | GSM8K     | SimpleQA  |
+|:-----------|:---------------------------|:----------|----------:|----------:|----------:|
+| freeact    | claude-3-5-sonnet-20241022 | zero-shot |  **53.1** |  **95.7** |  **57.5** |
+| smolagents | claude-3-5-sonnet-20241022 | few-shot  |      43.8 |      91.4 |      47.5 |
 
 Interestingly, these results were achieved using zero-shot prompting in `freeact`, while the smolagents implementation utilizes few-shot prompting. To ensure a fair comparison, we employed identical evaluation protocols and tools (converted to [skills](skills)).
 

diff --git a/freeact/examples/guided_tour/__init__.py b/freeact/examples/guided_tour/__init__.py
diff --git a/freeact/examples/guided_tour/basic_claude.py b/freeact/examples/guided_tour/basic_claude.py
@@ -0,0 +1,22 @@
+import asyncio
+
+from freeact import Claude, CodeActAgent, execution_environment
+from freeact.examples.utils import stream_turn
+
+
+async def main():
+    async with execution_environment(
+        ipybox_tag="ghcr.io/gradion-ai/ipybox:example",
+    ) as env:
+        model = Claude(
+            model_name="claude-3-5-sonnet-20241022",
+            logger=env.logger,
+        )
+        agent = CodeActAgent(model=model, executor=env.executor)
+
+        turn = agent.run("Calculate the square root of 1764")
+        await stream_turn(turn)
+
+
+if __name__ == "__main__":
+    asyncio.run(main())