Inspired by Ollama project, I wanted to have a similar experience for serving MLX models. Mlx from ml-explore is a new framework for running ML models in Apple Silicon. This app is intended to be used along with PyOllaMx
I'm using these in my day to day workflow and I intend to keep develop these for my use and benifit.
If you find this valuable, feel free to use it and contribute to this project as well. Please ⭐️ this repo to show your support and make my day!
I'm planning on work on next items on this roadmap.md. Feel free to comment your thoughts (if any) and influence my work (if interested)
MacOS DMGs are available in Releases page
-
Download & Install the PyOMlx MacOS App
-
Run the app
-
You will now see the application running in the system tray. Use PyOllaMx to chat with MLX models seamlessly
- Revamped the http server portion to use the
mlx_lm.server
module. As of the latest version (v0.20.5
) the module accepts dynamic model information from the incoming request. Hence this can be better utilized by PyOMlx. Also theload()
function supports automatic model download from HF if not available in local~/.cache
directory. This replaces the/download
endpoint. - Finally, since
mlx_lm.server
runs ahttpd
, there is no need for externalflask
. So I got rid of that too. Resulting PyOMlx binary is very slim (~100 MB) and much much faster. - Rest everything is same as v0.1.0
- Added OpenAI API Compatible chat completions and list models endpoint.
- Added
/download
endpoint to download MLX models directly from HuggingFace Hub. All models will be downloaded from MLX Community in HF Hub. - Added
/swagger.json
endpoint to serve OpenAPI Spec of all endpoints available with PyOMlx.
Now you simply use any standard OpenAI Client to interact with your MLX models easily. More info on the v0.1.0 release page.
- Updated
mlx-lm
to support Gemma models
- Automatically discover & serve MLX models that are downloaded from MLX Huggingface community.
- Easy start-up / shutdown via MacOS App
- System tray indication