tool-call: add support to llama-cli using new --tools arg #11556

bandoti · 2025-01-31T18:46:42Z

This PR adds support for tool-calls using a --tools switch to llama-cli.

It is currently ⚠Experimental!⚠

This required slight modifications to common_chat_apply_template in order to support passing a new common_params_tools type which encapsulates the tools JSON array and tool_choice.

This doesn't work yet needs the brains added... Trying to figure out how this all works in the server code if anyone has tips please feel free to chime in!😅

Tasks:

Integrating toolcall support with llama-cli

Add a --tools option to pass in a JSON tools array
Add a --tool-choice option which defaults to "auto" (see this ref)
Add a --tool-parallel switch for parallel tool-calls.
Copy remaining logic from oaicompat_completion_params_parse in utils.hpp into common_chat_apply_template (common.cpp).
Some other grammar changes in the main.cpp algorithm?

Implement toolcall handlers for Model Context Protocol (MCP).

Add C++ types for base MCP messages.
Add C++ types and procedures for Lifecycle phase of MCP protocol.
Implement Stdio transport.
Implement HTTP SSE transport using cURL.
Add base types in common library for abstracting out a tool-call handlers. This should include types/functions for translating between the underlying tool-call implementation (OpenAI style) to other formats (MCP in this case). After the template gets applied in common_chat_apply_template via a call to common_chat_params_init, the resulting prompt member of common_chat_params will contain the JSON-Formatted tool-calls. This should be translated and dispatched to the registered handlers (if one was specified).
Other refactoring to support receiving input from the handlers while simultaneously allowing the users input/interjection between request/response in the handlers.
Add C++ types for MCP utility messages to ping, cancel, and receive progress updates for long-running tool-calls.

bandoti · 2025-02-04T19:43:39Z

@ochafik I am working on adding the tool calls to llama-cli, and at this point I have wired into common_chat_apply_template initial support (from what I can tell) for passing in the templates and tool array/tool_choice.

However, I am needing some advice on how to handle the remaining fields of common_chat_params as returned by common_chat_params_init. It is my basic understanding of this that each time the template gets applied, it needs to relay this back to the sampling parameters so it can get hooked into the main token-processing routine. Is this correct? If so, do I simply need to tokenize/push the grammar triggers like server.cpp? At the moment when common_chat_apply_template is called it returns a string but I can change that by adding an out parameter or something.

Thank you for your work on the core of this feature I am excited to get it working on llama-cli! 😊

ochafik · 2025-02-05T16:10:06Z

Hey @bandoti , sorry for the delay, some quick background questions first:

What use case you have in mind for this, is it to treat the cli as a single shot server?
How would you display the output of the tool calls to make it useable (in openai format?). Could you add an example output to the PR description?

Have you considered going directly one step further and have the CLI call tools? @brucepro is looking into doing tool call w/ MCP servers from the server's Web UI (ref), maybe you could join forces / do the same in C++ w/ CURL).

bandoti · 2025-02-05T16:10:43Z

@ochafik I got this working now in llama-cli now. Here's the command I ran followed by the output:

 ./build/bin/llama-cli.exe -c 2048 -ngl 8 -cnv --jinja -m 'C:/Users/mtmcp/Downloads/Llama-3.2-3B-Instruct-Q6_K.gguf' --tools '[
    {
      "type":"function",
      "function":{
        "name":"get_current_weather",
        "description":"Get the current weather in a given location",
        "parameters":{
          "type":"object",
          "properties":{
            "location":{
              "type":"string",
              "description":"The city and state, e.g. San Francisco, CA"
            }
          },
          "required":["location"]
        }
      }
    }
  ]'


system

Environment: ipython
Cutting Knowledge Date: December 2023
Today Date: 05 Feb 2025

You have access to the following functions. To call a function, please respond with JSON for a function call.Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.Do not use variables.

{
    "type": "function",
    "function": {
        "name": "get_current_weather",
        "description": "Get the current weather in a given location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The city and state, e.g. San Francisco, CA"
                }
            },
            "required": [
                "location"
            ]
        }
    }
}

You are a helpful assistant


> What is the weather like in Mumbai?
{"name": "get_current_weather", "parameters": {"location": "Mumbai"}}

>
llama_perf_sampler_print:    sampling time =       1.41 ms /    36 runs   (    0.04 ms per token, 25477.71 tokens per second)
llama_perf_context_print:        load time =    1731.11 ms
llama_perf_context_print: prompt eval time =   17904.77 ms /   204 tokens (   87.77 ms per token,    11.39 tokens per second)
llama_perf_context_print:        eval time =    1457.84 ms /    18 runs   (   80.99 ms per token,    12.35 tokens per second)
llama_perf_context_print:       total time =   29930.62 ms /   222 tokens
Interrupted by user

bandoti · 2025-02-05T16:19:51Z

Hey @bandoti , sorry for the delay, some quick background questions first:
* What use case you have in mind for this, is it to treat the cli as a single shot server?

* How would you display the output of the tool calls to make it useable (in openai format?). Could you add an example output to the PR description?
Have you considered going directly one step further and have the CLI call tools? @brucepro is looking into doing tool call w/ MCP servers from the server's Web UI (ref), maybe you could join forces / do the same in C++ w/ CURL).

@ochafik Good timing we responded at the exact same time haha. No worries on the delay—here's some general objectives:

Testability. Having llama-cli being able to process these function calls can lend for some really useful automated tests using tools like expect &co. This can quickly validate logic in the function-call behavior.
I actually have been working on an on-going effort to wrap llama-cli in a Tcl scripting environment, and the general idea here is that these function calls could be extremely interesting way to create automation.

In both of these cases, the output can be processed and simply scanned for a valid JSON result. If it's valid, then honor the function calls otherwise just print to the console.

bandoti · 2025-02-05T16:36:46Z

I will track the MCP protocol work it sounds interesting! I still think there's a lot of need for local-only tools however, and want to ensure these features are workable/testable without standing up endpoints and such. 😊

When you mention adding this capability in cURL, how do you mean? Setting up llama-cli as a MCP client?

EDIT: After reading more on MCP I see the potential flow, where the AI runs and communicates with the resource services. I'd imagine building that on top of the changes here would work well. A series of services can simply be passed into the llama-cli and it could dispatch to them when it needs something (at least that's how I'm understanding it).

brucepro · 2025-02-05T16:59:05Z

I will track the MCP protocol work it sounds interesting! I still think there's a lot of need for local-only tools however, and want to ensure these features are workable/testable without standing up endpoints and such. 😊

When you mention adding this capability in cURL, how do you mean? Setting up llama-cli as a MCP client?

For MCP, I am adding the SSE client support into the webui. This link was the best example I found: https://github.com/apify/tester-mcp-client/blob/main/src/mcpClient.ts
Then you can run one of the proxy's that allows you to use MCP's servers directly. This one seemed promising. https://github.com/punkpeye/mcp-proxy/ although I think writing a python solution to handle the SSE api calls and just using the python sdk directly is where I will end up. https://github.com/modelcontextprotocol So in the end will have the WebUI able to add any SSE server with a congif of

{
  "mcpServers": {
    "fetch": {
      "name": "Fetch",
      "type": "sse",
      "serverUrl": "http://localhost:8765/sse"
    }
  }
}

Still in progress. Once I hit debug mode will update my repo and start testing.

bandoti · 2025-02-05T17:16:48Z

@brucepro thanks for the info on this. It seems to me, in general, a protocol like this is the way to go for the local AI in llama-cli to invoke actions as well. I'll take a closer look and see what it'll take to add it.

bandoti · 2025-02-05T19:02:24Z

@ochafik As I understand it the requirement to get this working I need to add a "translation" layer between the models OpenAI function call request/response and MCP, correct? This shouldn't be too difficult with cURL and the json library.

I really like the discovery aspect of the MCP protocol—will make managing a collection of functionality much easier.

So I will start working on it as I think this is an important part of the function call API. We can revisit the other aspects of MCP like prompts and the like—those are very powerful as well, albeit that's a fair amount of work so will have to be done gradually.

brucepro · 2025-02-11T04:53:50Z

@ochafik As I understand it the requirement to get this working I need to add a "translation" layer between the models OpenAI function call request/response and MCP, correct? This shouldn't be too difficult with cURL and the json library.

I really like the discovery aspect of the MCP protocol—will make managing a collection of functionality much easier.

So I will start working on it as I think this is an important part of the function call API. We can revisit the other aspects of MCP like prompts and the like—those are very powerful as well, albeit that's a fair amount of work so will have to be done gradually.

Did you get

@ochafik As I understand it the requirement to get this working I need to add a "translation" layer between the models OpenAI function call request/response and MCP, correct? This shouldn't be too difficult with cURL and the json library.

I really like the discovery aspect of the MCP protocol—will make managing a collection of functionality much easier.

So I will start working on it as I think this is an important part of the function call API. We can revisit the other aspects of MCP like prompts and the like—those are very powerful as well, albeit that's a fair amount of work so will have to be done gradually.

Did you make any progress on the cli mcp? I have a super basic React App made that seems to work with llamacpp here. https://github.com/brucepro/llamacppMCPClientDemo I tested with llama3.3 70b but not much else. Will be adding prompts and resources next and debugging. Once it is cleaned up, I will work on migrating it to the WebUI.

bandoti · 2025-02-11T11:18:02Z

@brucepro I'm currently working on adding the types for MCP protocol and initialization handshake. I have all the types defined just going to add unit test on them today.

Working in a different branch but I'll merge that piece in hopefully today.

I added a checklist in the PR description above to track these changes. 😊

bandoti · 2025-02-14T22:26:53Z

@brucepro Quick update: I have most of the pieces in place, just working on the SSE transport. I am hoping to finish it—well, make it marginally-workable—this weekend. I am leaving the stdio transport unfinished for this PR, but it can be followed up on later, as the HTTP endpoint has a bit more utility.

The SSE changes will require setting up a background thread listening to SSE endpoint while allowing tool-calls to be sent to a separate endpoint (arbitrarily set by the SSE endpoint event). There are some concurrency-related hiccups which may cause issues given that the MCP server can push an update to the tools list at any time. But, other than that, I don't foresee many other problems.

Looking forward to being able to test this thing! Stay tuned...

brucepro · 2025-02-14T22:57:30Z

@brucepro Quick update: I have most of the pieces in place, just working on the SSE transport. I am hoping to finish it—well, make it marginally-workable—this weekend. I am leaving the stdio transport unfinished for this PR, but it can be followed up on later, as the HTTP endpoint has a bit more utility.

The SSE changes will require setting up a background thread listening to SSE endpoint while allowing tool-calls to be sent to a separate endpoint (arbitrarily set by the SSE endpoint event). There are some concurrency-related hiccups which may cause issues given that the MCP server can push an update to the tools list at any time. But, other than that, I don't foresee many other problems.

Looking forward to being able to test this thing! Stay tuned...

Awesome. Looking forward to testing it out.

bandoti · 2025-02-15T18:37:17Z

@ochafik I took a quick look at your cleanup branch and see the switch to common_chat_templates_apply. I suppose at the moment per the run.cpp example, the intention is that each application will call this directly? The changes I put in place currently forwards the new toolcall::handler type through common_chat_format_single to invoke the tool-calls, but it seems like having a cleaner separation between applying the template and invoking the tool-call is desired.
Here's the high-level order:

Apply chat template
Update grammar/vocab/sampler
Invoke toolcall via toolcall::handler (if supplied)
Tokenize the tool-call response (if supplied)

I would like to unify the means of invoking a tool-call (from the client-side) so the logic may be shared. Would you be okay with updating common_chat_format_single to return common_chat_params instead of a string? This would ensure that step (2) above will be much cleaner to implement.

@ggerganov Please see the above.

ngxson · 2025-02-15T18:46:48Z

The common_chat_format_single is only used by llama-cli and llama-run (kinda one-time usage) so it's fine to update it (well, just make sure that it runs correctly with non-tool templates too)

In far future, it would be better to get rid of this function and track the KV cache at token level instead.

bandoti · 2025-02-15T18:53:50Z

@ngxson Okay sounds good. I will add an out-param instead of modifying the return type because it's not returning the full prompt but a delta on it. Main thing is removing the toolcall::handler as a parameter in these functions!

commit 98c4a8d Author: Mason M <[email protected]> Date: Wed Feb 19 11:05:18 2025 -0400 Refactor MCP transport callback mechanism commit 2b3d1f6 Author: Mason M <[email protected]> Date: Tue Feb 18 18:01:54 2025 -0400 add message_to_json function commit 3c7ae27 Author: Mason M <[email protected]> Date: Tue Feb 18 15:32:26 2025 -0400 Implement send routine commit 9cec1e0 Author: Mason M <[email protected]> Date: Tue Feb 18 10:12:17 2025 -0400 Fix include paths commit b5642f0 Author: Mason M <[email protected]> Date: Tue Feb 18 09:56:52 2025 -0400 Use log API commit 7a83b2b Author: Mason M <[email protected]> Date: Mon Feb 17 19:32:48 2025 -0400 Fix build errors commit cc7fd66 Author: Mason M <[email protected]> Date: Mon Feb 17 19:03:43 2025 -0400 Use condition variable to wait for endpoint event commit 73ccdd1 Author: Mason M <[email protected]> Date: Mon Feb 17 17:18:09 2025 -0400 Process SSE data asynchronously commit e9c37a3 Author: Mason M <[email protected]> Date: Mon Feb 17 14:01:56 2025 -0400 Add keep-alive header to sse handler commit 57f84e6 Author: Mason M <[email protected]> Date: Mon Feb 17 13:37:59 2025 -0400 Add methods for handling endpoint/message events commit 5c160f6 Author: Mason M <[email protected]> Date: Mon Feb 17 13:25:18 2025 -0400 Process sse values commit f51b493 Author: Mason M <[email protected]> Date: Mon Feb 17 13:07:38 2025 -0400 Clean up sse_read algorithm commit 29d6875 Author: Mason M <[email protected]> Date: Sun Feb 16 19:04:39 2025 -0400 WIP: implementing SSE protocol

bandoti · 2025-02-19T21:00:31Z

@ochafik I was able to successfully merge in your changes. Thanks for getting those through it's very helpful!

The SSE transport is finished (in it's initial incarnation) but hasn't been tested yet. I have all the parts in place and now the main task is to add some routines to convert the MCP tool-call messages into OAI JSON format, which goes through the new common_chat_tools_parse_oaicompat functions. There's a small amount of synchronization that needs to be added to the toolcall::mcp_impl to make the asynchronous mcp_transport routines block.

I should have these finished before end of week, and will do some initial testing to fix initial bugs on the MCP connection. After that it should be ready for general testing. I don't plan on changing the general architecture much (pending review feedback of course).

Add tools option to llama-cli

183029d

github-actions bot added the examples label Jan 31, 2025

bandoti added 7 commits January 31, 2025 17:57

tools_json_arr now properly passed to apply-template

4ad8258

Merge branch 'master' into llamacli-tools

352f79c

add tool-choice parameter

becf9b4

Add variant include

cd16957

Reset tools when empty string provided

4e8beb0

Pass template group to common_chat_apply_template

3437080

Merge branch 'master' into llamacli-tools

36c2f38

bandoti requested a review from ngxson as a code owner February 4, 2025 19:06

github-actions bot added testing Everything test related server labels Feb 4, 2025

bandoti and others added 2 commits February 5, 2025 08:48

Merge branch 'ggerganov:master' into llamacli-tools

a30111b

Copy sampler parameters from chat template

a726ada

ochafik self-requested a review February 5, 2025 17:09

bandoti added 5 commits February 12, 2025 10:09

Merge branch 'master' into llamacli-tools

a024747

Add handler and MCP message types

1dd2e3b

Merge branch 'master' into llamacli-tools

6458c71

Comment out unused parameters

b41f57c

Remove tabs

e7efd7c

bandoti added 4 commits February 14, 2025 16:09

Add MCP sse/stdio transport types

99f2fe3

Fix indent

3309b58

throw exceptions in stdio transport for now

376fbba

Only include SSE transport when LLAMA_CURL is set

80e6790

ochafik mentioned this pull request Feb 15, 2025

Add Jinja template support #11016

Merged

3 tasks

Split toolcall params into separate files

ff44762

bandoti added 4 commits February 15, 2025 16:19

Separate tool-call from template application

a9e3404

Merge branch 'master' into llamacli-tools

608304f

Merge branch 'llamacli-tools-sse' into llamacli-tools

a345aa9

Add noreturn to stdio transport methods

7b93c31

bandoti mentioned this pull request Feb 18, 2025

tool-call: refactor common chat / tool-call api (+ tests / fixes) #11900

Merged

3 tasks

bandoti added 5 commits February 19, 2025 11:08

Merge branch 'master' into llamacli-tools

6ce23b6

Post-Merge refactoring

f2af859

Rearrange the furniture!

90efb90

Fix input processing

78a8d90

bandoti added 2 commits February 19, 2025 17:09

Clean up some header inclusions

4d81086

Split toolcall into separate library

3b0dd4e

github-actions bot added the build Compilation issues label Feb 20, 2025

bandoti added 5 commits February 20, 2025 21:56

Convert chat_add_and_format to functor

8668d89

Enable LLAMA_TOOLCALL by default (for now)

a19ed47

Use cxx_std_17

5c0b0cb

Impl. initialize and tool_list routines

3e46978

Store callbacks in map

5d6a058

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tool-call: add support to llama-cli using new --tools arg #11556

tool-call: add support to llama-cli using new --tools arg #11556

bandoti commented Jan 31, 2025 •

edited

Loading

bandoti commented Feb 4, 2025

ochafik commented Feb 5, 2025

bandoti commented Feb 5, 2025

bandoti commented Feb 5, 2025

bandoti commented Feb 5, 2025 •

edited

Loading

brucepro commented Feb 5, 2025

bandoti commented Feb 5, 2025

bandoti commented Feb 5, 2025 •

edited

Loading

brucepro commented Feb 11, 2025

bandoti commented Feb 11, 2025

bandoti commented Feb 14, 2025

brucepro commented Feb 14, 2025

bandoti commented Feb 15, 2025 •

edited

Loading

ngxson commented Feb 15, 2025 •

edited

Loading

bandoti commented Feb 15, 2025

bandoti commented Feb 19, 2025

tool-call: add support to llama-cli using new --tools arg #11556

Are you sure you want to change the base?

tool-call: add support to llama-cli using new --tools arg #11556

Conversation

bandoti commented Jan 31, 2025 • edited Loading

Tasks:

bandoti commented Feb 4, 2025

ochafik commented Feb 5, 2025

bandoti commented Feb 5, 2025

bandoti commented Feb 5, 2025

bandoti commented Feb 5, 2025 • edited Loading

brucepro commented Feb 5, 2025

bandoti commented Feb 5, 2025

bandoti commented Feb 5, 2025 • edited Loading

brucepro commented Feb 11, 2025

bandoti commented Feb 11, 2025

bandoti commented Feb 14, 2025

brucepro commented Feb 14, 2025

bandoti commented Feb 15, 2025 • edited Loading

ngxson commented Feb 15, 2025 • edited Loading

bandoti commented Feb 15, 2025

bandoti commented Feb 19, 2025

bandoti commented Jan 31, 2025 •

edited

Loading

bandoti commented Feb 5, 2025 •

edited

Loading

bandoti commented Feb 5, 2025 •

edited

Loading

bandoti commented Feb 15, 2025 •

edited

Loading

ngxson commented Feb 15, 2025 •

edited

Loading