initial REST API for ramalama #726

dougsland · 2025-02-04T03:46:47Z

Just an initial code, we can always improve on the road.

Resolves: #725

Summary by Sourcery

New Features:

Provide endpoints for managing and interacting with ramalama, including listing available models, pulling models, running models, and stopping models.

Just an initial code, we can always improve on the road. Resolves: containers#725 Signed-off-by: Douglas Schilling Landgraf <[email protected]>

sourcery-ai · 2025-02-04T03:46:51Z

Reviewer's Guide by Sourcery

This pull request introduces an initial REST API for interacting with ramalama. It uses FastAPI to define endpoints for retrieving information, listing models, pulling models, running models, and stopping models. The API interacts with the ramalama CLI using subprocess.

Sequence diagram for model management operations

sequenceDiagram
    participant Client
    participant API as FastAPI Server
    participant CLI as RamaLama CLI

    Client->>API: GET /models
    API->>CLI: ramalama list
    CLI-->>API: model list
    API-->>Client: JSON response

    Client->>API: POST /pull/{model}
    API->>CLI: ramalama pull {model}
    CLI-->>API: success/error
    API-->>Client: JSON response

    Client->>API: POST /run/{model}
    API->>CLI: ramalama run {model}
    CLI-->>API: success/error
    API-->>Client: JSON response

    Client->>API: POST /stop/{model}
    API->>CLI: ramalama stop {model}
    CLI-->>API: success/error
    API-->>Client: JSON response

Class diagram for FastAPI endpoints

classDiagram
    class FastAPIApp {
        +get_root()
        +get_info()
        +get_ps()
        +list_models()
        +pull_model(model_name: str)
        +run_model(model_name: str)
        +stop_model(model_name: str)
    }

    note for FastAPIApp "All methods return JSON responses"
    note for FastAPIApp "Uses subprocess to interact with CLI"

File-Level Changes

Change	Details	Files
Define REST API endpoints using FastAPI.	Define /info endpoint to execute `ramalama info`. Define /ps endpoint to execute `ramalama ps`. Define /models endpoint to execute `ramalama list`. Define /pull/{model_name} endpoint to execute `ramalama pull {model_name}`. Define /run/{model_name} endpoint to execute `ramalama run {model_name}`. Define /stop/{model_name} endpoint to execute `ramalama stop {model_name}`. Add error handling for subprocess execution failures.	`restapi/api.py`
Add documentation for running the server and client-side interactions.	Add instructions for running the FastAPI server using uvicorn. Add example curl commands for retrieving model information. Add example curl commands for posting data to run a model.	`restapi/README.md`
Add dependencies.	Add fastapi dependency. Add uvicorn dependency.	`restapi/requirements.txt`

Possibly linked issues

RFE: REST API for ramalama #725: The PR implements the REST API requested in the issue.

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!
Generate a plan of action for an issue: Comment @sourcery-ai plan on
an issue to generate a plan of action for it.

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey @dougsland - I've reviewed your changes and found some issues that need to be addressed.

Blocking issues:

The model_name parameter is passed directly to subprocess.run() without validation, creating a potential command injection vulnerability. (link)

Overall Comments:

SECURITY: The API is executing shell commands directly from HTTP requests without proper input validation or sanitization. This creates a significant security vulnerability. Consider using a safer interface to the ramalama functionality or implementing strict input validation.
The API lacks any authentication mechanism. Given that this service can control AI model execution, authentication and authorization should be implemented before this goes into production.

Here's what I looked at during the review

🟡 General issues: 3 issues found
🔴 Security: 1 blocking issue, 2 other issues
🟢 Testing: all looks good
🟡 Complexity: 1 issue found
🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

restapi/api.py

restapi/README.md

restapi/api.py

ericcurtin · 2025-02-04T09:32:11Z

We can merge this tool because it's self contained, note though, we have a differently designed tool planned that does similar things called ramalama-server. It won't have any python dependancies outside of stdlib and it will be called by "ramalama serve". We have to ensure the dependancies of the tool in this PR don't leak into the RamaLama packages outside of the container. If people install this separately, it's fine.

ericcurtin · 2025-02-04T09:34:28Z

It's described here:

#598

rhatdan · 2025-02-04T13:05:45Z

I prefer we write up this API first before implementing it and make sure Podman Desktop team are happy with it. The service needs to be installable and useable on MAC and Windows platforms as well as Linux.

If it is going to be a separate tool then standard RamaLama, and Python pulls in too much stuff then the tool should probably be built in Go versus Python.

dougsland · 2025-02-04T14:48:59Z

I prefer we write up this API first before implementing it and make sure Podman Desktop team are happy with it. The service needs to be installable and useable on MAC and Windows platforms as well as Linux.

If it is going to be a separate tool then standard RamaLama, and Python pulls in too much stuff then the tool should
probably be built in Go versus Python.

All good, I can write in go if you guys prefer.

dougsland · 2025-02-04T14:50:28Z

I prefer we write up this API first before implementing it and make sure Podman Desktop team are happy with it. The service needs to be installable and useable on MAC and Windows platforms as well as Linux.
If it is going to be a separate tool then standard RamaLama, and Python pulls in too much stuff then the tool should
probably be built in Go versus Python.

All good, I can write in go if you guys prefer.

My intention is write something aligned with all projects. Just wrote in REST as it's easy to run under mac or windows. It's a simple curl/http call.

ericcurtin · 2025-02-04T14:53:11Z

I think golang makes as much sense as python, both work for me, the main thing is if we don't need many dependancies. It would be nice to not have any, it can be ideal for some platforms like native macOS, Ubuntu, etc.

We had good joy up to now writing http stuff without any dependancies in python3 @swarajpande5 wrote a fair bit.

And we've had good feedback in the community about that (not having a large python dependancy stack outside the containers).

golang means we have to worry about building more, but not against it, if we get some advantages from golang.

engelmi · 2025-02-04T15:10:35Z

For Go, the tool would need to be built for all platforms to guarantee a smooth experience just like with ramalama. On the other side python is slower - although I think this is neglectable in our case.
But I agree, it probably doesn't matter if this tool is implemented in Go or Python.

@dougsland Have you thought about using OpenAPI? This way the users can generate a client from that specification without any hassle. We can either generate it from code (see openapi generator) or we write the spec directly and use it ourselves to generate our server stubs from it.

dougsland · 2025-02-04T16:29:44Z

For Go, the tool would need to be built for all platforms to guarantee a smooth experience just like with ramalama. On the other side python is slower - although I think this is neglectable in our case. But I agree, it probably doesn't matter if this tool is implemented in Go or Python.

@dougsland Have you thought about using OpenAPI? This way the users can generate a client from that specification without any hassle. We can either generate it from code (see openapi generator) or we write the spec directly and use it ourselves to generate our server stubs from it.

could be a good idea @engelmi .

rhatdan · 2025-02-04T18:46:38Z

With this API does the caller get back the stdout and stderr?

dougsland · 2025-02-04T20:01:57Z

With this API does the caller get back the stdout and stderr?

Correct.

Example:

Linux http server:

$ uvicorn api:app --host 0.0.0.0 --port 8003 --reload
INFO:     Will watch for changes in these directories: ['/home/douglas/ramalarestapi/ramalama/ramalama/restapi']
INFO:     Uvicorn running on http://0.0.0.0:8003 (Press CTRL+C to quit)
INFO:     Started reloader process [117015] using StatReload
INFO:     Started server process [117017]
INFO:     Waiting for application startup.
INFO:     Application startup complete.

MacOS + curl:

$ curl -X POST "http://192.168.82.25:8003/pull/tinyllama" 
100% |████████████████████████████| Complete
Success: Command completed successfully

Error:

$ curl -X POST "http://192.168.82.25:8003/pull/tinyllamafoobar" 
100% |████████████████████████████| Complete
Error: Command failed with return code 1

info:

$ curl -X GET "http://192.168.82.25:8003/info" 

{\n    \"Engine\": {\n        \"Info\": {\n            \"host\": {\n                \"arch\": \"amd64\",\n                \"buildahVersion\": \"1.38.1\",\n                \"cgroupControllers\": [\n                    \"cpu\",\n                    \"io\",\n                    \"memory\",\n                    \"pids\"\n                ],\n                \"cgroupManager\": \"systemd\",\n                \"cgroupVersion\": \"v2\",\n                \"conmon\": {\n                    \"package\": \"conmon-2.1.12-3.fc41.x86_64\",\n                    \"path\": \"/usr/bin/conmon\",\n                    \"version\": \"conmon version 2.1.12, commit: \"\n                },\n                \"cpuUtilization\": {\n                    \"idlePercent\": 99.84,\n                    \"systemPercent\": 0.08,\n                    \"userPercent\": 0.09\n                },\n                \"cpus\": 16,\n                \"databaseBackend\": \"sqlite\",\n                \"distribution\": {\n                    \"distribution\": \"fedora\",\n                    \"variant\": \"workstation\",\n                    \"version\": \"41\"\n                },\n                \"eventLogger\": \"journald\",\n                \"freeLocks\": 2032,\n                \"hostname\": \"fedora\",\n                \"idMappings\": {\n                    \"gidmap\": [\n                        {\n                            \"container_id\": 0,\n                            \"host_id\": 1000,\n                            \"size\": 1\n                        },\n                        {\n                            \"container_id\": 1,\n                            \"host_id\": 524288,\n                            \"size\": 65536\n                        }\n                    ],\n                    \"uidmap\": [\n                        {\n                            \"container_id\": 0,\n                            \"host_id\": 1000,\n                            \"size\": 1\n                        },\n                        {\n                            \"container_id\": 1,\n                            \"host_id\": 524288,\n                            \"size\": 65536\n                        }\n                    ]\n                },\n                \"kernel\": \"6.12.10-200.fc41.x86_64\",\n                \"linkmode\": \"dynamic\",\n                \"logDriver\": \"journald\",\n                \"memFree\": 5116088320,\n                \"memTotal\": 67107631104,\n                \"networkBackend\": \"netavark\",\n                \"networkBackendInfo\": {\n                    \"backend\": \"netavark\",\n                    \"dns\": {\n                        \"package\": \"aardvark-dns-1.13.1-1.fc41.x86_64\",\n                        \"path\": \"/usr/libexec/podman/aardvark-dns\",\n                        \"version\": \"aardvark-dns 1.13.1\"\n                    },\n                    \"package\": \"netavark-1.13.1-1.fc41.x86_64\",\n                    \"path\": \"/usr/libexec/podman/netavark\",\n                    \"version\": \"netavark 1.13.1\"\n                },\n                \"ociRuntime\": {\n                    \"name\": \"crun\",\n                    \"package\": \"crun-1.19.1-1.fc41.x86_64\",\n                    \"path\": \"/usr/bin/crun\",\n                    \"version\": \"crun version 1.19.1\\ncommit: 3e32a70c93f5aa5fea69b50256cca7fd4aa23c80\\nrundir: /run/user/1000/crun\\nspec: 1.0.0\\n+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL\"\n                },\n                \"os\": \"linux\",\n                \"pasta\": {\n                    \"executable\": \"/usr/bin/pasta\",\n                    \"package\": \"passt-0^20250121.g4f2c8e7-2.fc41.x86_64\",\n                    \"version\": \"pasta 0^20250121.g4f2c8e7-2.fc41.x86_64\\nCopyright Red Hat\\nGNU General Public License, version 2 or later\\n  <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>\\nThis is free software: you are free to change and redistribute it.\\nThere is NO WARRANTY, to the extent permitted by law.\\n\"\n                },\n                \"remoteSocket\": {\n                    \"exists\": true,\n                    \"path\": \"/run/user/1000/podman/podman.sock\"\n                },\n                \"rootlessNetworkCmd\": \"pasta\",\n                \"security\": {\n                    \"apparmorEnabled\": false,\n                    \"capabilities\": \"CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT\",\n                    \"rootless\": true,\n                    \"seccompEnabled\": true,\n                    \"seccompProfilePath\": \"/usr/share/containers/seccomp.json\",\n                    \"selinuxEnabled\": true\n                },\n                \"serviceIsRemote\": false,\n                \"slirp4netns\": {\n                    \"executable\": \"\",\n                    \"package\": \"\",\n                    \"version\": \"\"\n                },\n                \"swapFree\": 9323466752,\n                \"swapTotal\": 10200539136,\n                \"uptime\": \"50h 9m 3.00s (Approximately 2.08 days)\",\n                \"variant\": \"\"\n            },\n            \"plugins\": {\n                \"authorization\": null,\n                \"log\": [\n                    \"k8s-file\",\n                    \"none\",\n                    \"passthrough\",\n                    \"journald\"\n                ],\n                \"network\": [\n                    \"bridge\",\n                    \"macvlan\",\n                    \"ipvlan\"\n                ],\n                \"volume\": [\n                    \"local\"\n                ]\n            },\n            \"registries\": {\n                \"search\": [\n                    \"registry.fedoraproject.org\",\n                    \"registry.access.redhat.com\",\n                    \"docker.io\"\n                ]\n            },\n            \"store\": {\n                \"configFile\": \"/home/douglas/.config/containers/storage.conf\",\n                \"containerStore\": {\n                    \"number\": 16,\n                    \"paused\": 0,\n                    \"running\": 6,\n                    \"stopped\": 10\n                },\n                \"graphDriverName\": \"overlay\",\n                \"graphOptions\": {},\n                \"graphRoot\": \"/home/douglas/.local/share/containers/storage\",\n                \"graphRootAllocated\": 1997176569856,\n                \"graphRootUsed\": 607885430784,\n                \"graphStatus\": {\n                    \"Backing Filesystem\": \"btrfs\",\n                    \"Native Overlay Diff\": \"true\",\n                    \"Supports d_type\": \"true\",\n                    \"Supports shifting\": \"false\",\n                    \"Supports volatile\": \"true\",\n                    \"Using metacopy\": \"false\"\n                },\n                \"imageCopyTmpDir\": \"/var/tmp\",\n                \"imageStore\": {\n                    \"number\": 5\n                },\n                \"runRoot\": \"/run/user/1000/containers\",\n                \"transientStore\": false,\n                \"volumePath\": \"/home/douglas/.local/share/containers/storage/volumes\"\n            },\n            \"version\": {\n                \"APIVersion\": \"5.3.2\",\n                \"Built\": 1737504000,\n                \"BuiltTime\": \"Tue Jan 21 19:00:00 2025\",\n                \"GitCommit\": \"\",\n                \"GoVersion\": \"go1.23.4\",\n                \"Os\": \"linux\",\n                \"OsArch\": \"linux/amd64\",\n                \"Version\": \"5.3.2\"\n            }\n        },\n        \"Name\": \"podman\"\n    },\n    \"Image\": \"quay.io/ramalama/ramalama\",\n    \"Runtime\": \"llama.cpp\",\n    \"Store\": \"/home/douglas/.local/share/ramalama\",\n    \"UseContainer\": true,\n    \"Version\": \"0.5.4\"\n}"
}

I would like to do more adjustments but pretty much is mimic the output from server to client in any OS using http. However, if we want different approach, worth stopping and starting OpenAPI or Go or pure stdlib python (which can be done quick) but a feedback from PodmanDesktop team is preferable so we are all in sync. Finally, we can also implement authentication over http.

ericcurtin · 2025-02-04T20:14:16Z

The main thing with golang is we have to start doing things like macOS builds properly, but we already do that for podman so 🤷‍♂️

ericcurtin · 2025-02-04T20:15:02Z

Or else we say, it's just not a macOS native feature and put the golang to be launched inside the container.

rhatdan · 2025-02-04T21:43:14Z

If we can just add ramalama service and have it listen on a unix domain socket, with "standard" python, then I think this would be the best outcome.

I don't see the need for the run function, but most of the others functions yes.

    containers (ps)     list all RamaLama containers
    convert             convert AI Model from local storage to OCI Image
    info                Display information pertaining to setup of RamaLama.
    list (ls)           list all downloaded AI Models
    login               login to remote registry
    logout              logout from remote registry
    pull                pull AI Model from Model registry to local storage
    push                push AI Model from local storage to remote registry
    rm                  remove AI Model from local storage
    serve               serve REST API on specified AI Model
    stop                stop named container that is running AI Model
    version             display version of AI Model

NO
    run                 run specified AI Model as a chatbot

rhatdan · 2025-02-04T21:45:04Z

@benoitf @cdrage @slemeur thoughts?

benoitf · 2025-02-04T22:22:49Z

I would +1 for any solution not using python ( or any other runtime dependency)
Having a self-contained binary /native executable will help on reusing/embedding/reusing it

On the API, then for example 'pull' what would be the data being sent for the progress, etc.

also would it be the same kind of pulling than the hugging faces libraries ?

ericcurtin · 2025-02-04T22:43:54Z

Open WebUI would be a great test tool for this btw (Open WebUI was also a requested feature at FOSDEM).

It's one of the highest requested features and exposes a lot of the missing APIs from llama.cpp and vllm (some of them are probably Ollama-specific, but if they are easy to implement, that's no biggie).

rhatdan · 2025-02-05T11:31:50Z

The full solution will NOT be python free. The RESTAPI tool will be executing RamaLama commands. Or at least calling Ramalama functions. RamaLama is not going to be ported to a different language at this time.

initial REST API for ramalama

203dcd3

Just an initial code, we can always improve on the road. Resolves: containers#725 Signed-off-by: Douglas Schilling Landgraf <[email protected]>

dougsland requested review from rhatdan, ericcurtin, bmahabirbu, maxamillion, swarajpande5, jhjaggars, cgruver, slp and engelmi as code owners February 4, 2025 03:46

sourcery-ai bot reviewed Feb 4, 2025

View reviewed changes

dougsland mentioned this pull request Feb 4, 2025

Provide model info in chat ui & allow multiple models #598

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

initial REST API for ramalama #726

initial REST API for ramalama #726

dougsland commented Feb 4, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Feb 4, 2025 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

sourcery-ai bot left a comment

ericcurtin commented Feb 4, 2025 •

edited

Loading

ericcurtin commented Feb 4, 2025

rhatdan commented Feb 4, 2025

dougsland commented Feb 4, 2025

dougsland commented Feb 4, 2025

ericcurtin commented Feb 4, 2025 •

edited

Loading

engelmi commented Feb 4, 2025

dougsland commented Feb 4, 2025

rhatdan commented Feb 4, 2025

dougsland commented Feb 4, 2025 •

edited

Loading

ericcurtin commented Feb 4, 2025

ericcurtin commented Feb 4, 2025

rhatdan commented Feb 4, 2025

rhatdan commented Feb 4, 2025

benoitf commented Feb 4, 2025

ericcurtin commented Feb 4, 2025 •

edited

Loading

rhatdan commented Feb 5, 2025

initial REST API for ramalama #726

Are you sure you want to change the base?

initial REST API for ramalama #726

Conversation

dougsland commented Feb 4, 2025 • edited by sourcery-ai bot Loading

Summary by Sourcery

sourcery-ai bot commented Feb 4, 2025 • edited Loading

Reviewer's Guide by Sourcery

Sequence diagram for model management operations

Class diagram for FastAPI endpoints

File-Level Changes

Possibly linked issues

Interacting with Sourcery

Customizing Your Experience

Getting Help

sourcery-ai bot left a comment

Choose a reason for hiding this comment

ericcurtin commented Feb 4, 2025 • edited Loading

ericcurtin commented Feb 4, 2025

rhatdan commented Feb 4, 2025

dougsland commented Feb 4, 2025

dougsland commented Feb 4, 2025

ericcurtin commented Feb 4, 2025 • edited Loading

engelmi commented Feb 4, 2025

dougsland commented Feb 4, 2025

rhatdan commented Feb 4, 2025

dougsland commented Feb 4, 2025 • edited Loading

ericcurtin commented Feb 4, 2025

ericcurtin commented Feb 4, 2025

rhatdan commented Feb 4, 2025

rhatdan commented Feb 4, 2025

benoitf commented Feb 4, 2025

ericcurtin commented Feb 4, 2025 • edited Loading

rhatdan commented Feb 5, 2025

dougsland commented Feb 4, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Feb 4, 2025 •

edited

Loading

ericcurtin commented Feb 4, 2025 •

edited

Loading

ericcurtin commented Feb 4, 2025 •

edited

Loading

dougsland commented Feb 4, 2025 •

edited

Loading

ericcurtin commented Feb 4, 2025 •

edited

Loading