-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tool-call: add support to llama-cli using new --tools arg #11556
base: master
Are you sure you want to change the base?
Conversation
@ochafik I am working on adding the tool calls to llama-cli, and at this point I have wired into However, I am needing some advice on how to handle the remaining fields of Thank you for your work on the core of this feature I am excited to get it working on llama-cli! 😊 |
Hey @bandoti , sorry for the delay, some quick background questions first:
Have you considered going directly one step further and have the CLI call tools? @brucepro is looking into doing tool call w/ MCP servers from the server's Web UI (ref), maybe you could join forces / do the same in C++ w/ CURL). |
@ochafik I got this working now in llama-cli now. Here's the command I ran followed by the output:
|
@ochafik Good timing we responded at the exact same time haha. No worries on the delay—here's some general objectives:
In both of these cases, the output can be processed and simply scanned for a valid JSON result. If it's valid, then honor the function calls otherwise just print to the console. |
I will track the MCP protocol work it sounds interesting! I still think there's a lot of need for local-only tools however, and want to ensure these features are workable/testable without standing up endpoints and such. 😊 When you mention adding this capability in cURL, how do you mean? Setting up llama-cli as a MCP client? EDIT: After reading more on MCP I see the potential flow, where the AI runs and communicates with the resource services. I'd imagine building that on top of the changes here would work well. A series of services can simply be passed into the llama-cli and it could dispatch to them when it needs something (at least that's how I'm understanding it). |
For MCP, I am adding the SSE client support into the webui. This link was the best example I found: https://github.com/apify/tester-mcp-client/blob/main/src/mcpClient.ts
Still in progress. Once I hit debug mode will update my repo and start testing. |
@brucepro thanks for the info on this. It seems to me, in general, a protocol like this is the way to go for the local AI in llama-cli to invoke actions as well. I'll take a closer look and see what it'll take to add it. |
@ochafik As I understand it the requirement to get this working I need to add a "translation" layer between the models OpenAI function call request/response and MCP, correct? This shouldn't be too difficult with cURL and the json library. I really like the discovery aspect of the MCP protocol—will make managing a collection of functionality much easier. So I will start working on it as I think this is an important part of the function call API. We can revisit the other aspects of MCP like prompts and the like—those are very powerful as well, albeit that's a fair amount of work so will have to be done gradually. |
Did you get
Did you make any progress on the cli mcp? I have a super basic React App made that seems to work with llamacpp here. https://github.com/brucepro/llamacppMCPClientDemo I tested with llama3.3 70b but not much else. Will be adding prompts and resources next and debugging. Once it is cleaned up, I will work on migrating it to the WebUI. |
@brucepro I'm currently working on adding the types for MCP protocol and initialization handshake. I have all the types defined just going to add unit test on them today. Working in a different branch but I'll merge that piece in hopefully today. I added a checklist in the PR description above to track these changes. 😊 |
@brucepro Quick update: I have most of the pieces in place, just working on the SSE transport. I am hoping to finish it—well, make it marginally-workable—this weekend. I am leaving the stdio transport unfinished for this PR, but it can be followed up on later, as the HTTP endpoint has a bit more utility. The SSE changes will require setting up a background thread listening to SSE endpoint while allowing tool-calls to be sent to a separate endpoint (arbitrarily set by the SSE endpoint event). There are some concurrency-related hiccups which may cause issues given that the MCP server can push an update to the tools list at any time. But, other than that, I don't foresee many other problems. Looking forward to being able to test this thing! Stay tuned... |
Awesome. Looking forward to testing it out. |
@ochafik I took a quick look at your cleanup branch and see the switch to
I would like to unify the means of invoking a tool-call (from the client-side) so the logic may be shared. Would you be okay with updating @ggerganov Please see the above. |
The common_chat_format_single is only used by llama-cli and llama-run (kinda one-time usage) so it's fine to update it (well, just make sure that it runs correctly with non-tool templates too) In far future, it would be better to get rid of this function and track the KV cache at token level instead. |
@ngxson Okay sounds good. I will add an out-param instead of modifying the return type because it's not returning the full prompt but a delta on it. Main thing is removing the |
commit 98c4a8d Author: Mason M <[email protected]> Date: Wed Feb 19 11:05:18 2025 -0400 Refactor MCP transport callback mechanism commit 2b3d1f6 Author: Mason M <[email protected]> Date: Tue Feb 18 18:01:54 2025 -0400 add message_to_json function commit 3c7ae27 Author: Mason M <[email protected]> Date: Tue Feb 18 15:32:26 2025 -0400 Implement send routine commit 9cec1e0 Author: Mason M <[email protected]> Date: Tue Feb 18 10:12:17 2025 -0400 Fix include paths commit b5642f0 Author: Mason M <[email protected]> Date: Tue Feb 18 09:56:52 2025 -0400 Use log API commit 7a83b2b Author: Mason M <[email protected]> Date: Mon Feb 17 19:32:48 2025 -0400 Fix build errors commit cc7fd66 Author: Mason M <[email protected]> Date: Mon Feb 17 19:03:43 2025 -0400 Use condition variable to wait for endpoint event commit 73ccdd1 Author: Mason M <[email protected]> Date: Mon Feb 17 17:18:09 2025 -0400 Process SSE data asynchronously commit e9c37a3 Author: Mason M <[email protected]> Date: Mon Feb 17 14:01:56 2025 -0400 Add keep-alive header to sse handler commit 57f84e6 Author: Mason M <[email protected]> Date: Mon Feb 17 13:37:59 2025 -0400 Add methods for handling endpoint/message events commit 5c160f6 Author: Mason M <[email protected]> Date: Mon Feb 17 13:25:18 2025 -0400 Process sse values commit f51b493 Author: Mason M <[email protected]> Date: Mon Feb 17 13:07:38 2025 -0400 Clean up sse_read algorithm commit 29d6875 Author: Mason M <[email protected]> Date: Sun Feb 16 19:04:39 2025 -0400 WIP: implementing SSE protocol
@ochafik I was able to successfully merge in your changes. Thanks for getting those through it's very helpful! The SSE transport is finished (in it's initial incarnation) but hasn't been tested yet. I have all the parts in place and now the main task is to add some routines to convert the MCP tool-call messages into OAI JSON format, which goes through the new I should have these finished before end of week, and will do some initial testing to fix initial bugs on the MCP connection. After that it should be ready for general testing. I don't plan on changing the general architecture much (pending review feedback of course). |
This PR adds support for tool-calls using a
--tools
switch to llama-cli.It is currently ⚠Experimental!⚠
This required slight modifications to
common_chat_apply_template
in order to support passing a newcommon_params_tools
type which encapsulates thetools
JSON array andtool_choice
.This doesn't work yet needs the brains added... Trying to figure out how this all works in the server code if anyone has tips please feel free to chime in!😅
Tasks:
Integrating toolcall support with llama-cli
--tools
option to pass in a JSON tools array--tool-choice
option which defaults to "auto" (see this ref)--tool-parallel
switch for parallel tool-calls.oaicompat_completion_params_parse
in utils.hpp intocommon_chat_apply_template
(common.cpp).main.cpp
algorithm?Implement toolcall handlers for Model Context Protocol (MCP).
common_chat_apply_template
via a call tocommon_chat_params_init
, the resulting prompt member ofcommon_chat_params
will contain the JSON-Formatted tool-calls. This should be translated and dispatched to the registered handlers (if one was specified).