Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initial REST API for ramalama #726

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dougsland
Copy link
Collaborator

@dougsland dougsland commented Feb 4, 2025

Just an initial code, we can always improve on the road.

Resolves: #725

Summary by Sourcery

New Features:

  • Provide endpoints for managing and interacting with ramalama, including listing available models, pulling models, running models, and stopping models.

Just an initial code, we can always improve on the road.

Resolves: containers#725
Signed-off-by: Douglas Schilling Landgraf <[email protected]>
Copy link
Contributor

sourcery-ai bot commented Feb 4, 2025

Reviewer's Guide by Sourcery

This pull request introduces an initial REST API for interacting with ramalama. It uses FastAPI to define endpoints for retrieving information, listing models, pulling models, running models, and stopping models. The API interacts with the ramalama CLI using subprocess.

Sequence diagram for model management operations

sequenceDiagram
    participant Client
    participant API as FastAPI Server
    participant CLI as RamaLama CLI

    Client->>API: GET /models
    API->>CLI: ramalama list
    CLI-->>API: model list
    API-->>Client: JSON response

    Client->>API: POST /pull/{model}
    API->>CLI: ramalama pull {model}
    CLI-->>API: success/error
    API-->>Client: JSON response

    Client->>API: POST /run/{model}
    API->>CLI: ramalama run {model}
    CLI-->>API: success/error
    API-->>Client: JSON response

    Client->>API: POST /stop/{model}
    API->>CLI: ramalama stop {model}
    CLI-->>API: success/error
    API-->>Client: JSON response
Loading

Class diagram for FastAPI endpoints

classDiagram
    class FastAPIApp {
        +get_root()
        +get_info()
        +get_ps()
        +list_models()
        +pull_model(model_name: str)
        +run_model(model_name: str)
        +stop_model(model_name: str)
    }

    note for FastAPIApp "All methods return JSON responses"
    note for FastAPIApp "Uses subprocess to interact with CLI"
Loading

File-Level Changes

Change Details Files
Define REST API endpoints using FastAPI.
  • Define /info endpoint to execute ramalama info.
  • Define /ps endpoint to execute ramalama ps.
  • Define /models endpoint to execute ramalama list.
  • Define /pull/{model_name} endpoint to execute ramalama pull {model_name}.
  • Define /run/{model_name} endpoint to execute ramalama run {model_name}.
  • Define /stop/{model_name} endpoint to execute ramalama stop {model_name}.
  • Add error handling for subprocess execution failures.
restapi/api.py
Add documentation for running the server and client-side interactions.
  • Add instructions for running the FastAPI server using uvicorn.
  • Add example curl commands for retrieving model information.
  • Add example curl commands for posting data to run a model.
restapi/README.md
Add dependencies.
  • Add fastapi dependency.
  • Add uvicorn dependency.
restapi/requirements.txt

Possibly linked issues


Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!
  • Generate a plan of action for an issue: Comment @sourcery-ai plan on
    an issue to generate a plan of action for it.

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @dougsland - I've reviewed your changes and found some issues that need to be addressed.

Blocking issues:

  • The model_name parameter is passed directly to subprocess.run() without validation, creating a potential command injection vulnerability. (link)

Overall Comments:

  • SECURITY: The API is executing shell commands directly from HTTP requests without proper input validation or sanitization. This creates a significant security vulnerability. Consider using a safer interface to the ramalama functionality or implementing strict input validation.
  • The API lacks any authentication mechanism. Given that this service can control AI model execution, authentication and authorization should be implemented before this goes into production.
Here's what I looked at during the review
  • 🟡 General issues: 3 issues found
  • 🔴 Security: 1 blocking issue, 2 other issues
  • 🟢 Testing: all looks good
  • 🟡 Complexity: 1 issue found
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

restapi/api.py Show resolved Hide resolved
restapi/api.py Show resolved Hide resolved
restapi/api.py Show resolved Hide resolved
restapi/api.py Show resolved Hide resolved
restapi/api.py Show resolved Hide resolved
restapi/README.md Show resolved Hide resolved
restapi/api.py Show resolved Hide resolved
@ericcurtin
Copy link
Collaborator

ericcurtin commented Feb 4, 2025

We can merge this tool because it's self contained, note though, we have a differently designed tool planned that does similar things called ramalama-server. It won't have any python dependancies outside of stdlib and it will be called by "ramalama serve". We have to ensure the dependancies of the tool in this PR don't leak into the RamaLama packages outside of the container. If people install this separately, it's fine.

@ericcurtin
Copy link
Collaborator

It's described here:

#598

@rhatdan
Copy link
Member

rhatdan commented Feb 4, 2025

I prefer we write up this API first before implementing it and make sure Podman Desktop team are happy with it. The service needs to be installable and useable on MAC and Windows platforms as well as Linux.

If it is going to be a separate tool then standard RamaLama, and Python pulls in too much stuff then the tool should probably be built in Go versus Python.

@dougsland
Copy link
Collaborator Author

I prefer we write up this API first before implementing it and make sure Podman Desktop team are happy with it. The service needs to be installable and useable on MAC and Windows platforms as well as Linux.

If it is going to be a separate tool then standard RamaLama, and Python pulls in too much stuff then the tool should
probably be built in Go versus Python.

All good, I can write in go if you guys prefer.

@dougsland
Copy link
Collaborator Author

I prefer we write up this API first before implementing it and make sure Podman Desktop team are happy with it. The service needs to be installable and useable on MAC and Windows platforms as well as Linux.
If it is going to be a separate tool then standard RamaLama, and Python pulls in too much stuff then the tool should
probably be built in Go versus Python.

All good, I can write in go if you guys prefer.

My intention is write something aligned with all projects. Just wrote in REST as it's easy to run under mac or windows. It's a simple curl/http call.

@ericcurtin
Copy link
Collaborator

ericcurtin commented Feb 4, 2025

I think golang makes as much sense as python, both work for me, the main thing is if we don't need many dependancies. It would be nice to not have any, it can be ideal for some platforms like native macOS, Ubuntu, etc.

We had good joy up to now writing http stuff without any dependancies in python3 @swarajpande5 wrote a fair bit.

And we've had good feedback in the community about that (not having a large python dependancy stack outside the containers).

golang means we have to worry about building more, but not against it, if we get some advantages from golang.

@engelmi
Copy link
Member

engelmi commented Feb 4, 2025

For Go, the tool would need to be built for all platforms to guarantee a smooth experience just like with ramalama. On the other side python is slower - although I think this is neglectable in our case.
But I agree, it probably doesn't matter if this tool is implemented in Go or Python.

@dougsland Have you thought about using OpenAPI? This way the users can generate a client from that specification without any hassle. We can either generate it from code (see openapi generator) or we write the spec directly and use it ourselves to generate our server stubs from it.

@dougsland
Copy link
Collaborator Author

For Go, the tool would need to be built for all platforms to guarantee a smooth experience just like with ramalama. On the other side python is slower - although I think this is neglectable in our case. But I agree, it probably doesn't matter if this tool is implemented in Go or Python.

@dougsland Have you thought about using OpenAPI? This way the users can generate a client from that specification without any hassle. We can either generate it from code (see openapi generator) or we write the spec directly and use it ourselves to generate our server stubs from it.

could be a good idea @engelmi .

@rhatdan
Copy link
Member

rhatdan commented Feb 4, 2025

With this API does the caller get back the stdout and stderr?

@dougsland
Copy link
Collaborator Author

dougsland commented Feb 4, 2025

With this API does the caller get back the stdout and stderr?

Correct.

Example:

Linux http server:

$ uvicorn api:app --host 0.0.0.0 --port 8003 --reload
INFO:     Will watch for changes in these directories: ['/home/douglas/ramalarestapi/ramalama/ramalama/restapi']
INFO:     Uvicorn running on http://0.0.0.0:8003 (Press CTRL+C to quit)
INFO:     Started reloader process [117015] using StatReload
INFO:     Started server process [117017]
INFO:     Waiting for application startup.
INFO:     Application startup complete.

MacOS + curl:

$ curl -X POST "http://192.168.82.25:8003/pull/tinyllama" 
100% |████████████████████████████| Complete
Success: Command completed successfully

Error:

$ curl -X POST "http://192.168.82.25:8003/pull/tinyllamafoobar" 
100% |████████████████████████████| Complete
Error: Command failed with return code 1

info:

$ curl -X GET "http://192.168.82.25:8003/info" 

{\n    \"Engine\": {\n        \"Info\": {\n            \"host\": {\n                \"arch\": \"amd64\",\n                \"buildahVersion\": \"1.38.1\",\n                \"cgroupControllers\": [\n                    \"cpu\",\n                    \"io\",\n                    \"memory\",\n                    \"pids\"\n                ],\n                \"cgroupManager\": \"systemd\",\n                \"cgroupVersion\": \"v2\",\n                \"conmon\": {\n                    \"package\": \"conmon-2.1.12-3.fc41.x86_64\",\n                    \"path\": \"/usr/bin/conmon\",\n                    \"version\": \"conmon version 2.1.12, commit: \"\n                },\n                \"cpuUtilization\": {\n                    \"idlePercent\": 99.84,\n                    \"systemPercent\": 0.08,\n                    \"userPercent\": 0.09\n                },\n                \"cpus\": 16,\n                \"databaseBackend\": \"sqlite\",\n                \"distribution\": {\n                    \"distribution\": \"fedora\",\n                    \"variant\": \"workstation\",\n                    \"version\": \"41\"\n                },\n                \"eventLogger\": \"journald\",\n                \"freeLocks\": 2032,\n                \"hostname\": \"fedora\",\n                \"idMappings\": {\n                    \"gidmap\": [\n                        {\n                            \"container_id\": 0,\n                            \"host_id\": 1000,\n                            \"size\": 1\n                        },\n                        {\n                            \"container_id\": 1,\n                            \"host_id\": 524288,\n                            \"size\": 65536\n                        }\n                    ],\n                    \"uidmap\": [\n                        {\n                            \"container_id\": 0,\n                            \"host_id\": 1000,\n                            \"size\": 1\n                        },\n                        {\n                            \"container_id\": 1,\n                            \"host_id\": 524288,\n                            \"size\": 65536\n                        }\n                    ]\n                },\n                \"kernel\": \"6.12.10-200.fc41.x86_64\",\n                \"linkmode\": \"dynamic\",\n                \"logDriver\": \"journald\",\n                \"memFree\": 5116088320,\n                \"memTotal\": 67107631104,\n                \"networkBackend\": \"netavark\",\n                \"networkBackendInfo\": {\n                    \"backend\": \"netavark\",\n                    \"dns\": {\n                        \"package\": \"aardvark-dns-1.13.1-1.fc41.x86_64\",\n                        \"path\": \"/usr/libexec/podman/aardvark-dns\",\n                        \"version\": \"aardvark-dns 1.13.1\"\n                    },\n                    \"package\": \"netavark-1.13.1-1.fc41.x86_64\",\n                    \"path\": \"/usr/libexec/podman/netavark\",\n                    \"version\": \"netavark 1.13.1\"\n                },\n                \"ociRuntime\": {\n                    \"name\": \"crun\",\n                    \"package\": \"crun-1.19.1-1.fc41.x86_64\",\n                    \"path\": \"/usr/bin/crun\",\n                    \"version\": \"crun version 1.19.1\\ncommit: 3e32a70c93f5aa5fea69b50256cca7fd4aa23c80\\nrundir: /run/user/1000/crun\\nspec: 1.0.0\\n+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL\"\n                },\n                \"os\": \"linux\",\n                \"pasta\": {\n                    \"executable\": \"/usr/bin/pasta\",\n                    \"package\": \"passt-0^20250121.g4f2c8e7-2.fc41.x86_64\",\n                    \"version\": \"pasta 0^20250121.g4f2c8e7-2.fc41.x86_64\\nCopyright Red Hat\\nGNU General Public License, version 2 or later\\n  <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>\\nThis is free software: you are free to change and redistribute it.\\nThere is NO WARRANTY, to the extent permitted by law.\\n\"\n                },\n                \"remoteSocket\": {\n                    \"exists\": true,\n                    \"path\": \"/run/user/1000/podman/podman.sock\"\n                },\n                \"rootlessNetworkCmd\": \"pasta\",\n                \"security\": {\n                    \"apparmorEnabled\": false,\n                    \"capabilities\": \"CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT\",\n                    \"rootless\": true,\n                    \"seccompEnabled\": true,\n                    \"seccompProfilePath\": \"/usr/share/containers/seccomp.json\",\n                    \"selinuxEnabled\": true\n                },\n                \"serviceIsRemote\": false,\n                \"slirp4netns\": {\n                    \"executable\": \"\",\n                    \"package\": \"\",\n                    \"version\": \"\"\n                },\n                \"swapFree\": 9323466752,\n                \"swapTotal\": 10200539136,\n                \"uptime\": \"50h 9m 3.00s (Approximately 2.08 days)\",\n                \"variant\": \"\"\n            },\n            \"plugins\": {\n                \"authorization\": null,\n                \"log\": [\n                    \"k8s-file\",\n                    \"none\",\n                    \"passthrough\",\n                    \"journald\"\n                ],\n                \"network\": [\n                    \"bridge\",\n                    \"macvlan\",\n                    \"ipvlan\"\n                ],\n                \"volume\": [\n                    \"local\"\n                ]\n            },\n            \"registries\": {\n                \"search\": [\n                    \"registry.fedoraproject.org\",\n                    \"registry.access.redhat.com\",\n                    \"docker.io\"\n                ]\n            },\n            \"store\": {\n                \"configFile\": \"/home/douglas/.config/containers/storage.conf\",\n                \"containerStore\": {\n                    \"number\": 16,\n                    \"paused\": 0,\n                    \"running\": 6,\n                    \"stopped\": 10\n                },\n                \"graphDriverName\": \"overlay\",\n                \"graphOptions\": {},\n                \"graphRoot\": \"/home/douglas/.local/share/containers/storage\",\n                \"graphRootAllocated\": 1997176569856,\n                \"graphRootUsed\": 607885430784,\n                \"graphStatus\": {\n                    \"Backing Filesystem\": \"btrfs\",\n                    \"Native Overlay Diff\": \"true\",\n                    \"Supports d_type\": \"true\",\n                    \"Supports shifting\": \"false\",\n                    \"Supports volatile\": \"true\",\n                    \"Using metacopy\": \"false\"\n                },\n                \"imageCopyTmpDir\": \"/var/tmp\",\n                \"imageStore\": {\n                    \"number\": 5\n                },\n                \"runRoot\": \"/run/user/1000/containers\",\n                \"transientStore\": false,\n                \"volumePath\": \"/home/douglas/.local/share/containers/storage/volumes\"\n            },\n            \"version\": {\n                \"APIVersion\": \"5.3.2\",\n                \"Built\": 1737504000,\n                \"BuiltTime\": \"Tue Jan 21 19:00:00 2025\",\n                \"GitCommit\": \"\",\n                \"GoVersion\": \"go1.23.4\",\n                \"Os\": \"linux\",\n                \"OsArch\": \"linux/amd64\",\n                \"Version\": \"5.3.2\"\n            }\n        },\n        \"Name\": \"podman\"\n    },\n    \"Image\": \"quay.io/ramalama/ramalama\",\n    \"Runtime\": \"llama.cpp\",\n    \"Store\": \"/home/douglas/.local/share/ramalama\",\n    \"UseContainer\": true,\n    \"Version\": \"0.5.4\"\n}"
}

I would like to do more adjustments but pretty much is mimic the output from server to client in any OS using http. However, if we want different approach, worth stopping and starting OpenAPI or Go or pure stdlib python (which can be done quick) but a feedback from PodmanDesktop team is preferable so we are all in sync. Finally, we can also implement authentication over http.

@ericcurtin
Copy link
Collaborator

The main thing with golang is we have to start doing things like macOS builds properly, but we already do that for podman so 🤷‍♂️

@ericcurtin
Copy link
Collaborator

Or else we say, it's just not a macOS native feature and put the golang to be launched inside the container.

@rhatdan
Copy link
Member

rhatdan commented Feb 4, 2025

If we can just add ramalama service and have it listen on a unix domain socket, with "standard" python, then I think this would be the best outcome.

I don't see the need for the run function, but most of the others functions yes.

    containers (ps)     list all RamaLama containers
    convert             convert AI Model from local storage to OCI Image
    info                Display information pertaining to setup of RamaLama.
    list (ls)           list all downloaded AI Models
    login               login to remote registry
    logout              logout from remote registry
    pull                pull AI Model from Model registry to local storage
    push                push AI Model from local storage to remote registry
    rm                  remove AI Model from local storage
    serve               serve REST API on specified AI Model
    stop                stop named container that is running AI Model
    version             display version of AI Model
NO
    run                 run specified AI Model as a chatbot

@rhatdan
Copy link
Member

rhatdan commented Feb 4, 2025

@benoitf @cdrage @slemeur thoughts?

@benoitf
Copy link

benoitf commented Feb 4, 2025

I would +1 for any solution not using python ( or any other runtime dependency)
Having a self-contained binary /native executable will help on reusing/embedding/reusing it

On the API, then for example 'pull' what would be the data being sent for the progress, etc.

also would it be the same kind of pulling than the hugging faces libraries ?

@ericcurtin
Copy link
Collaborator

ericcurtin commented Feb 4, 2025

Open WebUI would be a great test tool for this btw (Open WebUI was also a requested feature at FOSDEM).

It's one of the highest requested features and exposes a lot of the missing APIs from llama.cpp and vllm (some of them are probably Ollama-specific, but if they are easy to implement, that's no biggie).

@rhatdan
Copy link
Member

rhatdan commented Feb 5, 2025

The full solution will NOT be python free. The RESTAPI tool will be executing RamaLama commands. Or at least calling Ramalama functions. RamaLama is not going to be ported to a different language at this time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RFE: REST API for ramalama
5 participants