Skip to content

kspviswa/PyOMlx

Repository files navigation

PyOMlx

Serve MlX models locally!

Downloads

Motivation

Inspired by Ollama project, I wanted to have a similar experience for serving MLX models. Mlx from ml-explore is a new framework for running ML models in Apple Silicon. This app is intended to be used along with PyOllaMx

I'm using these in my day to day workflow and I intend to keep develop these for my use and benifit.

If you find this valuable, feel free to use it and contribute to this project as well. Please ⭐️ this repo to show your support and make my day!

I'm planning on work on next items on this roadmap.md. Feel free to comment your thoughts (if any) and influence my work (if interested)

MacOS DMGs are available in Releases page

How to use

  1. Download & Install the PyOMlx MacOS App

  2. Run the app

  3. You will now see the application running in the system tray. Use PyOllaMx to chat with MLX models seamlessly

Features

  • Revamped the http server portion to use the mlx_lm.server module. As of the latest version (v0.20.5) the module accepts dynamic model information from the incoming request. Hence this can be better utilized by PyOMlx. Also the load() function supports automatic model download from HF if not available in local ~/.cache directory. This replaces the /download endpoint.
  • Finally, since mlx_lm.server runs a httpd, there is no need for external flask. So I got rid of that too. Resulting PyOMlx binary is very slim (~100 MB) and much much faster.
  • Rest everything is same as v0.1.0
  • Added OpenAI API Compatible chat completions and list models endpoint.
  • Added /download endpoint to download MLX models directly from HuggingFace Hub. All models will be downloaded from MLX Community in HF Hub.
  • Added /swagger.json endpoint to serve OpenAPI Spec of all endpoints available with PyOMlx.

Now you simply use any standard OpenAI Client to interact with your MLX models easily. More info on the v0.1.0 release page.

  • Updated mlx-lm to support Gemma models
  • Automatically discover & serve MLX models that are downloaded from MLX Huggingface community.
  • Easy start-up / shutdown via MacOS App
  • System tray indication

Demo

pyollamx_unedited_demo.mp4