From a5e9cac1aa2cb899729a7b88fa96a19ee129b52d Mon Sep 17 00:00:00 2001 From: mzegla Date: Fri, 20 Dec 2024 16:43:28 +0100 Subject: [PATCH 1/9] init --- demos/continuous_batching/README.md | 36 +++++++++++++++++++++++++++++ windows_create_package.bat | 6 +++-- 2 files changed, 40 insertions(+), 2 deletions(-) diff --git a/demos/continuous_batching/README.md b/demos/continuous_batching/README.md index 4bcd961c23..d9b56dbc4c 100644 --- a/demos/continuous_batching/README.md +++ b/demos/continuous_batching/README.md @@ -5,6 +5,9 @@ That makes it easy to use and efficient especially on on Intel® Xeon® processo > **Note:** This demo was tested on Intel® Xeon® processors Gen4 and Gen5 and Intel dGPU ARC and Flex models on Ubuntu22/24 and RedHat8/9. +::::{tab-set} +:::{tab-item} Linux +:sync: prepare-linux ## Get the docker image Build the image from source to try the latest enhancements in this feature. @@ -18,6 +21,14 @@ It will create an image called `openvino/model_server:latest`. > **Note:** `GPU` parameter in image build command is needed to include dependencies for GPU device. > **Note:** The public image from the last release might be not compatible with models exported using the the latest export script. Check the [demo version from the last release](https://github.com/openvinotoolkit/model_server/tree/releases/2024/4/demos/continuous_batching) to use the public docker image. +::: +:::{tab-item} Windows +:sync: prepare-windows +## Get model server package +Download `ovms.zip` package and unpack it to `model_server` directory. The package contains OVMS binary and all of its dependecies and is ready to run. +::: +:::: + ## Model preparation > **Note** Python 3.9 or higher is need for that step Here, the original Pytorch LLM model and the tokenizer will be converted to IR format and optionally quantized. @@ -63,8 +74,13 @@ The default configuration of the `LLMExecutor` should work in most cases but the Note that the `models_path` parameter in the graph file can be an absolute path or relative to the `base_path` from `config.json`. Check the [LLM calculator documentation](../../docs/llm/reference.md) to learn about configuration options. + ## Start-up +::::{tab-set} +:::{tab-item} Linux +:sync: run-linux + ### CPU Running this command starts the container with CPU only target device: @@ -81,6 +97,26 @@ python demos/common/export_models/export_model.py text_generation --source_model docker run -d --rm -p 8000:8000 --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -v $(pwd)/models:/workspace:ro openvino/model_server:latest-gpu --rest_port 8000 --config_path /workspace/config.json ``` +::: +:::{tab-item} Windows +:sync: run-windows + +Running this command the model server in the current shell: +```bash +.\ovms\ovms.exe --rest_port 8000 --config_path .\models\config.json +``` + +### GPU + +In case you want to use GPU device to run the generation, export the models with precision matching the GPU capacity and adjust pipeline configuration. +It can be applied using the commands below: +```bash +python demos/common/export_models/export_model.py text_generation --source_model meta-llama/Meta-Llama-3-8B-Instruct --weight-format int4 --target_device GPU --cache_size 2 --config_file_path models/config.json --model_repository_path models --overwrite_models +``` +Then rerun above command as configuration file has already been adjusted to deploy model on GPU. + +::: +:::: ### Check readiness diff --git a/windows_create_package.bat b/windows_create_package.bat index 2a2529c9a0..b9c9a6fa3d 100644 --- a/windows_create_package.bat +++ b/windows_create_package.bat @@ -26,8 +26,10 @@ md dist\windows\ovms copy bazel-bin\src\ovms.exe dist\windows\ovms if !errorlevel! neq 0 exit /b !errorlevel! -copy %cd%\bazel-out\x64_windows-opt\bin\src\python39.dll dist\windows\ovms -if !errorlevel! neq 0 exit /b !errorlevel! +xcopy C:\opt\ovms-python-3.9.6-embed dist\windows\ovms\python /E /I /H +if %errorlevel% neq 0 ( + echo Error copying python into the distribution location. The package will not contain self-contained python. +) copy %cd%\bazel-out\x64_windows-opt\bin\src\python\binding\pyovms.pyd dist\windows\ovms if !errorlevel! neq 0 exit /b !errorlevel! From 013982b3e6ed2eb55a71142aed21e6ea3aa02b35 Mon Sep 17 00:00:00 2001 From: mzegla Date: Fri, 20 Dec 2024 16:50:04 +0100 Subject: [PATCH 2/9] style --- demos/continuous_batching/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/demos/continuous_batching/README.md b/demos/continuous_batching/README.md index d9b56dbc4c..e3486a374c 100644 --- a/demos/continuous_batching/README.md +++ b/demos/continuous_batching/README.md @@ -25,7 +25,7 @@ It will create an image called `openvino/model_server:latest`. :::{tab-item} Windows :sync: prepare-windows ## Get model server package -Download `ovms.zip` package and unpack it to `model_server` directory. The package contains OVMS binary and all of its dependecies and is ready to run. +Download `ovms.zip` package and unpack it to `model_server` directory. The package contains OVMS binary and all of its dependencies and is ready to run. ::: :::: From 8a08fafd5403253a8ebcd4e0217255139d59bc3b Mon Sep 17 00:00:00 2001 From: Zeglarski Date: Tue, 7 Jan 2025 15:41:36 +0100 Subject: [PATCH 3/9] revert embeded python --- windows_create_package.bat | 5 ----- 1 file changed, 5 deletions(-) diff --git a/windows_create_package.bat b/windows_create_package.bat index b9c9a6fa3d..cfaf93442f 100644 --- a/windows_create_package.bat +++ b/windows_create_package.bat @@ -26,11 +26,6 @@ md dist\windows\ovms copy bazel-bin\src\ovms.exe dist\windows\ovms if !errorlevel! neq 0 exit /b !errorlevel! -xcopy C:\opt\ovms-python-3.9.6-embed dist\windows\ovms\python /E /I /H -if %errorlevel% neq 0 ( - echo Error copying python into the distribution location. The package will not contain self-contained python. -) - copy %cd%\bazel-out\x64_windows-opt\bin\src\python\binding\pyovms.pyd dist\windows\ovms if !errorlevel! neq 0 exit /b !errorlevel! From 5c484a201f53d3f96ff0651b76009b4002b68bb0 Mon Sep 17 00:00:00 2001 From: Zeglarski Date: Wed, 8 Jan 2025 15:16:01 +0100 Subject: [PATCH 4/9] remove tabs --- demos/continuous_batching/README.md | 100 +++++++++++++++++----------- 1 file changed, 60 insertions(+), 40 deletions(-) diff --git a/demos/continuous_batching/README.md b/demos/continuous_batching/README.md index e3486a374c..f9ceea8a80 100644 --- a/demos/continuous_batching/README.md +++ b/demos/continuous_batching/README.md @@ -5,30 +5,6 @@ That makes it easy to use and efficient especially on on Intel® Xeon® processo > **Note:** This demo was tested on Intel® Xeon® processors Gen4 and Gen5 and Intel dGPU ARC and Flex models on Ubuntu22/24 and RedHat8/9. -::::{tab-set} -:::{tab-item} Linux -:sync: prepare-linux -## Get the docker image - -Build the image from source to try the latest enhancements in this feature. -```bash -git clone https://github.com/openvinotoolkit/model_server.git -cd model_server -make release_image GPU=1 -``` -It will create an image called `openvino/model_server:latest`. -> **Note:** This operation might take 40min or more depending on your build host. -> **Note:** `GPU` parameter in image build command is needed to include dependencies for GPU device. -> **Note:** The public image from the last release might be not compatible with models exported using the the latest export script. Check the [demo version from the last release](https://github.com/openvinotoolkit/model_server/tree/releases/2024/4/demos/continuous_batching) to use the public docker image. - -::: -:::{tab-item} Windows -:sync: prepare-windows -## Get model server package -Download `ovms.zip` package and unpack it to `model_server` directory. The package contains OVMS binary and all of its dependencies and is ready to run. -::: -:::: - ## Model preparation > **Note** Python 3.9 or higher is need for that step Here, the original Pytorch LLM model and the tokenizer will be converted to IR format and optionally quantized. @@ -36,12 +12,12 @@ That ensures faster initialization time, better performance and lower memory con LLM engine parameters will be defined inside the `graph.pbtxt` file. Install python dependencies for the conversion script: -```bash +```console pip3 install -U -r demos/common/export_models/requirements.txt ``` Run optimum-cli to download and quantize the model: -```bash +```console mkdir models python demos/common/export_models/export_model.py text_generation --source_model meta-llama/Meta-Llama-3-8B-Instruct --weight-format fp16 --kv_cache_precision u8 --config_file_path models/config.json --model_repository_path models ``` @@ -75,11 +51,7 @@ Note that the `models_path` parameter in the graph file can be an absolute path Check the [LLM calculator documentation](../../docs/llm/reference.md) to learn about configuration options. -## Start-up - -::::{tab-set} -:::{tab-item} Linux -:sync: run-linux +## Deploying with Docker ### CPU @@ -97,26 +69,74 @@ python demos/common/export_models/export_model.py text_generation --source_model docker run -d --rm -p 8000:8000 --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -v $(pwd)/models:/workspace:ro openvino/model_server:latest-gpu --rest_port 8000 --config_path /workspace/config.json ``` -::: -:::{tab-item} Windows -:sync: run-windows -Running this command the model server in the current shell: +### Build Image From Source (Linux Host) + +In case you want to try out features that have not been released yet, you can build the image from source code yourself. +```bash +git clone https://github.com/openvinotoolkit/model_server.git +cd model_server +make release_image GPU=1 +``` +It will create an image called `openvino/model_server:latest`. +> **Note:** This operation might take 40min or more depending on your build host. +> **Note:** `GPU` parameter in image build command is needed to include dependencies for GPU device. +> **Note:** The public image from the last release might be not compatible with models exported using the the latest export script. Check the [demo version from the last release](https://github.com/openvinotoolkit/model_server/tree/releases/2024/4/demos/continuous_batching) to use the public docker image. + +## Deploying on Bare Metal + +Download model server archive and unpack it to `model_server` directory. The package contains OVMS binary and all of its dependencies. + +```console +curl https://github.com/openvinotoolkit/model_server/releases/download// +tar -xf +``` +where: + +- `` - model server version: `v2024.4`, `v2024.5` etc. +- `` - package for desired OS, one of: `ovms_redhat.tar.gz`, `ovms_ubuntu22.tar.gz`, `ovms_win.zip` + +For correct Python initialization also set `PYTHONHOME` environment variable in the shell that will be used to launch model server: + +**Linux** + ```bash -.\ovms\ovms.exe --rest_port 8000 --config_path .\models\config.json +export PYTHONHOME=$PWD/ovms/python +``` + +**Windows Command Line**: +```bat +set PYTHONHOME=$pwd\ovms\python ``` +**Windows PowerShell**: +```powershell +$env:PYTHONHOME=$pwd\ovms\python +``` + +Once it's set, you can launch the model server. + +### CPU + +In model preparation section, configuration is set to load models on CPU, so you can simply run the binary pointing to the configuration file and selecting port for the HTTP server to expose inference endpoint. + +```console +.\ovms\ovms --rest_port 8000 --config_path .\models\config.json +``` + + ### GPU In case you want to use GPU device to run the generation, export the models with precision matching the GPU capacity and adjust pipeline configuration. It can be applied using the commands below: -```bash +```console python demos/common/export_models/export_model.py text_generation --source_model meta-llama/Meta-Llama-3-8B-Instruct --weight-format int4 --target_device GPU --cache_size 2 --config_file_path models/config.json --model_repository_path models --overwrite_models ``` -Then rerun above command as configuration file has already been adjusted to deploy model on GPU. +Then rerun above command as configuration file has already been adjusted to deploy model on GPU: -::: -:::: +```console +.\ovms\ovms --rest_port 8000 --config_path .\models\config.json +``` ### Check readiness From 23b967de805f36c898d060419030da50c8991c16 Mon Sep 17 00:00:00 2001 From: Zeglarski Date: Thu, 9 Jan 2025 12:16:28 +0100 Subject: [PATCH 5/9] bash -> console --- demos/continuous_batching/README.md | 26 +++++++++++++++----------- 1 file changed, 15 insertions(+), 11 deletions(-) diff --git a/demos/continuous_batching/README.md b/demos/continuous_batching/README.md index f9ceea8a80..755294c7d7 100644 --- a/demos/continuous_batching/README.md +++ b/demos/continuous_batching/README.md @@ -96,22 +96,26 @@ where: - `` - model server version: `v2024.4`, `v2024.5` etc. - `` - package for desired OS, one of: `ovms_redhat.tar.gz`, `ovms_ubuntu22.tar.gz`, `ovms_win.zip` -For correct Python initialization also set `PYTHONHOME` environment variable in the shell that will be used to launch model server: +For correct Python initialization also set `PYTHONHOME` environment variable in the shell that will be used to launch model server. +It may also be required to add this location to `PATH` in case there are already other Python installation on the system so that model server picks the right one. **Linux** ```bash export PYTHONHOME=$PWD/ovms/python +export PATH=$PWD/ovms/python;$PATH ``` **Windows Command Line**: ```bat -set PYTHONHOME=$pwd\ovms\python +set PYTHONHOME="$pwd\ovms\python" +set PATH="$pwd\ovms\python;%PATH%" ``` **Windows PowerShell**: ```powershell -$env:PYTHONHOME=$pwd\ovms\python +$env:PYTHONHOME="$pwd\ovms\python" +$env:PATH="$pwd\ovms\python;$env:PATH" ``` Once it's set, you can launch the model server. @@ -121,7 +125,7 @@ Once it's set, you can launch the model server. In model preparation section, configuration is set to load models on CPU, so you can simply run the binary pointing to the configuration file and selecting port for the HTTP server to expose inference endpoint. ```console -.\ovms\ovms --rest_port 8000 --config_path .\models\config.json +./ovms/ovms --rest_port 8000 --config_path ./models/config.json ``` @@ -135,13 +139,13 @@ python demos/common/export_models/export_model.py text_generation --source_model Then rerun above command as configuration file has already been adjusted to deploy model on GPU: ```console -.\ovms\ovms --rest_port 8000 --config_path .\models\config.json +./ovms/ovms --rest_port 8000 --config_path ./models/config.json ``` ### Check readiness Wait for the model to load. You can check the status with a simple command: -```bash +```console curl http://localhost:8000/v1/config ``` ```json @@ -168,7 +172,7 @@ Chat endpoint is expected to be used for scenarios where conversation context sh Completion endpoint should be used to pass the prompt directly by the client and for models without the jinja template. ### Unary: -```bash +```console curl http://localhost:8000/v3/chat/completions \ -H "Content-Type: application/json" \ -d '{ @@ -212,7 +216,7 @@ curl http://localhost:8000/v3/chat/completions \ ``` A similar call can be made with a `completion` endpoint: -```bash +```console curl http://localhost:8000/v3/completions \ -H "Content-Type: application/json" \ -d '{ @@ -248,7 +252,7 @@ curl http://localhost:8000/v3/completions \ The endpoints `chat/completions` are compatible with OpenAI client so it can be easily used to generate code also in streaming mode: Install the client library: -```bash +```console pip3 install openai ``` ```python @@ -275,7 +279,7 @@ It looks like you're testing me! ``` A similar code can be applied for the completion endpoint: -```bash +```console pip3 install openai ``` ```python @@ -306,7 +310,7 @@ It looks like you're testing me! OpenVINO Model Server employs efficient parallelization for text generation. It can be used to generate text also in high concurrency in the environment shared by multiple clients. It can be demonstrated using benchmarking app from vLLM repository: -```bash +```console git clone --branch v0.6.0 --depth 1 https://github.com/vllm-project/vllm cd vllm pip3 install -r requirements-cpu.txt --extra-index-url https://download.pytorch.org/whl/cpu From 31352e7cbe35d10f372892cb8810cc387c1f3254 Mon Sep 17 00:00:00 2001 From: Zeglarski Date: Thu, 9 Jan 2025 17:41:45 +0100 Subject: [PATCH 6/9] introduce setupvars --- demos/continuous_batching/README.md | 2 +- docs/deploying_server.md | 282 +--------------------------- docs/deploying_server_baremetal.md | 208 ++++++++++++++++++++ docs/deploying_server_docker.md | 75 ++++++++ docs/deploying_server_kubernetes.md | 21 +++ setupvars.bat | 21 +++ setupvars.ps1 | 19 ++ windows_create_package.bat | 3 + windows_prepare_python.bat | 2 +- 9 files changed, 353 insertions(+), 280 deletions(-) create mode 100644 docs/deploying_server_baremetal.md create mode 100644 docs/deploying_server_docker.md create mode 100644 docs/deploying_server_kubernetes.md create mode 100644 setupvars.bat create mode 100644 setupvars.ps1 diff --git a/demos/continuous_batching/README.md b/demos/continuous_batching/README.md index 755294c7d7..11b257db88 100644 --- a/demos/continuous_batching/README.md +++ b/demos/continuous_batching/README.md @@ -97,7 +97,7 @@ where: - `` - package for desired OS, one of: `ovms_redhat.tar.gz`, `ovms_ubuntu22.tar.gz`, `ovms_win.zip` For correct Python initialization also set `PYTHONHOME` environment variable in the shell that will be used to launch model server. -It may also be required to add this location to `PATH` in case there are already other Python installation on the system so that model server picks the right one. +It may also be required to add OVMS-provided Python catalog to `PATH` to make it a primary choice for the serving during startup. **Linux** diff --git a/docs/deploying_server.md b/docs/deploying_server.md index 6b33c7752b..b61d1fd800 100644 --- a/docs/deploying_server.md +++ b/docs/deploying_server.md @@ -1,281 +1,7 @@ # Deploy Model Server {#ovms_docs_deploying_server} -1. Docker is the recommended way to deploy OpenVINO Model Server. Pre-built container images are available on Docker Hub and Red Hat Ecosystem Catalog. -2. Host Model Server on baremetal. -3. Deploy OpenVINO Model Server in Kubernetes via helm chart, Kubernetes Operator or OpenShift Operator. +There are multiple options for deploying OpenVINO Model Server -## Deploying Model Server in Docker Container - -This is a step-by-step guide on how to deploy OpenVINO™ Model Server on Linux, using a pre-build Docker Container. - -**Before you start, make sure you have:** - -- [Docker Engine](https://docs.docker.com/engine/) installed -- Intel® Core™ processor (6-13th gen.) or Intel® Xeon® processor (1st to 4th gen.) -- Linux, macOS or Windows via [WSL](https://docs.microsoft.com/en-us/windows/wsl/) -- (optional) AI accelerators [supported by OpenVINO](https://docs.openvino.ai/2024/openvino-workflow/running-inference/inference-devices-and-modes.html). Accelerators are tested only on bare-metal Linux hosts. - -### Launch Model Server Container - -This example shows how to launch the model server with a ResNet50 image classification model from a cloud storage: - -#### Step 1. Pull Model Server Image - -Pull an image from Docker: - -```bash -docker pull openvino/model_server:latest -``` - -or [RedHat Ecosystem Catalog](https://catalog.redhat.com/software/containers/intel/openvino-model-server/607833052937385fc98515de): - -``` -docker pull registry.connect.redhat.com/intel/openvino-model-server:latest -``` - -#### Step 2. Prepare Data for Serving - -##### 2.1 Start the container with the model - -```bash -wget https://storage.openvinotoolkit.org/repositories/open_model_zoo/2022.1/models_bin/2/resnet50-binary-0001/FP32-INT1/resnet50-binary-0001.{xml,bin} -P models/resnet50/1 -docker run -u $(id -u) -v $(pwd)/models:/models -p 9000:9000 openvino/model_server:latest \ ---model_name resnet --model_path /models/resnet50 \ ---layout NHWC:NCHW --port 9000 -``` - -##### 2.2 Download input files: an image and a label mapping file - -```bash -wget https://raw.githubusercontent.com/openvinotoolkit/model_server/main/demos/common/static/images/zebra.jpeg -wget https://raw.githubusercontent.com/openvinotoolkit/model_server/main/demos/common/python/classes.py -``` - -##### 2.3 Install the Python-based ovmsclient package - -```bash -pip3 install ovmsclient -``` - - -#### Step 3. Run Prediction - - -```bash -echo 'import numpy as np -from classes import imagenet_classes -from ovmsclient import make_grpc_client - -client = make_grpc_client("localhost:9000") - -with open("zebra.jpeg", "rb") as f: - img = f.read() - -output = client.predict({"0": img}, "resnet") -result_index = np.argmax(output[0]) -print(imagenet_classes[result_index])' >> predict.py - -python predict.py -zebra -``` -If everything is set up correctly, you will see 'zebra' prediction in the output. - -## Deploying Model Server on Baremetal (without container) -It is possible to deploy Model Server outside of container. -To deploy Model Server on baremetal, use pre-compiled binaries for Ubuntu20, Ubuntu22 or RHEL8. - -::::{tab-set} -:::{tab-item} Ubuntu 20.04 -:sync: ubuntu-20-04 -Build the binary: - -```{code} sh -# Clone the model server repository -git clone https://github.com/openvinotoolkit/model_server -cd model_server -# Build docker images (the binary is one of the artifacts) -make docker_build BASE_OS=ubuntu20 PYTHON_DISABLE=1 RUN_TESTS=0 -# Unpack the package -tar -xzvf dist/ubuntu20/ovms.tar.gz -``` -Install required libraries: -```{code} sh -sudo apt update -y && apt install -y liblibxml2 curl -``` -Set path to the libraries -```{code} sh -export LD_LIBRARY_PATH=${pwd}/ovms/lib -``` -In case of the build with Python calculators for MediaPipe graphs (PYTHON_DISABLE=0), run also: -```{code} sh -export PYTHONPATH=${pwd}/ovms/lib/python -sudo apt -y install libpython3.8 -``` -::: -:::{tab-item} Ubuntu 22.04 -:sync: ubuntu-22-04 -Download precompiled package: -```{code} sh -wget https://github.com/openvinotoolkit/model_server/releases/download/v2024.5/ovms_ubuntu22.tar.gz -tar -xzvf ovms_ubuntu22.tar.gz -``` -or build it yourself: -```{code} sh -# Clone the model server repository -git clone https://github.com/openvinotoolkit/model_server -cd model_server -# Build docker images (the binary is one of the artifacts) -make docker_build PYTHON_DISABLE=1 RUN_TESTS=0 -# Unpack the package -tar -xzvf dist/ubuntu22/ovms.tar.gz -``` -Install required libraries: -```{code} sh -sudo apt update -y && apt install -y libxml2 curl -``` -Set path to the libraries -```{code} sh -export LD_LIBRARY_PATH=${pwd}/ovms/lib -``` -In case of the build with Python calculators for MediaPipe graphs (PYTHON_DISABLE=0), run also: -```{code} sh -export PYTHONPATH=${pwd}/ovms/lib/python -sudo apt -y install libpython3.10 -``` -::: -:::{tab-item} Ubuntu 24.04 -:sync: ubuntu-24-04 -Download precompiled package: -```{code} sh -wget https://github.com/openvinotoolkit/model_server/releases/download/v2024.5/ovms_ubuntu22.tar.gz -tar -xzvf ovms_ubuntu22.tar.gz -``` -or build it yourself: -```{code} sh -# Clone the model server repository -git clone https://github.com/openvinotoolkit/model_server -cd model_server -# Build docker images (the binary is one of the artifacts) -make docker_build PYTHON_DISABLE=1 RUN_TESTS=0 -# Unpack the package -tar -xzvf dist/ubuntu22/ovms.tar.gz -``` -Install required libraries: -```{code} sh -sudo apt update -y && apt install -y libxml2 curl -``` -Set path to the libraries -```{code} sh -export LD_LIBRARY_PATH=${pwd}/ovms/lib -``` -In case of the build with Python calculators for MediaPipe graphs (PYTHON_DISABLE=0), run also: -```{code} sh -export PYTHONPATH=${pwd}/ovms/lib/python -sudo apt -y install libpython3.10 -``` -::: -:::{tab-item} RHEL 8.10 -:sync: rhel-8-10 -Download precompiled package: -```{code} sh -wget https://github.com/openvinotoolkit/model_server/releases/download/v2024.5/ovms_redhat.tar.gz -tar -xzvf ovms_redhat.tar.gz -``` -or build it yourself: -```{code} sh -# Clone the model server repository -git clone https://github.com/openvinotoolkit/model_server -cd model_server -# Build docker images (the binary is one of the artifacts) -make docker_build BASE_OS=redhat PYTHON_DISABLE=1 RUN_TESTS=0 -# Unpack the package -tar -xzvf dist/redhat/ovms.tar.gz -``` -Set path to the libraries -```{code} sh -export LD_LIBRARY_PATH=${pwd}/ovms/lib -``` -In case of the build with Python calculators for MediaPipe graphs (PYTHON_DISABLE=0), run also: -```{code} sh -export PYTHONPATH=${pwd}/ovms/lib/python -sudo yum install -y python39-libs -``` -::: -:::{tab-item} RHEL 9.4 -:sync: rhel-9.4 -Download precompiled package: -```{code} sh -wget https://github.com/openvinotoolkit/model_server/releases/download/v2024.5/ovms_redhat.tar.gz -tar -xzvf ovms_redhat.tar.gz -``` -or build it yourself: -```{code} sh -# Clone the model server repository -git clone https://github.com/openvinotoolkit/model_server -cd model_server -# Build docker images (the binary is one of the artifacts) -make docker_build BASE_OS=redhat PYTHON_DISABLE=1 RUN_TESTS=0 -# Unpack the package -tar -xzvf dist/redhat/ovms.tar.gz -``` -Install required libraries: -```{code} sh -sudo yum install compat-openssl11.x86_64 -``` -Set path to the libraries -```{code} sh -export LD_LIBRARY_PATH=${pwd}/ovms/lib -``` -In case of the build with Python calculators for MediaPipe graphs (PYTHON_DISABLE=0), run also: -```{code} sh -export PYTHONPATH=${pwd}/ovms/lib/python -sudo yum install -y python39-libs -``` -::: -:::: - -Start the server: - -```bash -wget https://storage.openvinotoolkit.org/repositories/open_model_zoo/2022.1/models_bin/2/resnet50-binary-0001/FP32-INT1/resnet50-binary-0001.{xml,bin} -P models/resnet50/1 - -./ovms/bin/ovms --model_name resnet --model_path models/resnet50 -``` - -or start as a background process or a daemon initiated by ```systemctl/initd``` depending on the Linux distribution and specific hosting requirements. - -Most of the Model Server documentation demonstrate containers usage, but the same can be achieved with just the binary package. -Learn more about model server [starting parameters](parameters.md). - -> **NOTE**: -> When serving models on [AI accelerators](accelerators.md), some additional steps may be required to install device drivers and dependencies. -> Learn more in the [Additional Configurations for Hardware](https://docs.openvino.ai/2024/get-started/configurations.html) documentation. - - -## Deploying Model Server in Kubernetes - -There are three recommended methods for deploying OpenVINO Model Server in Kubernetes: -1. [helm chart](https://github.com/openvinotoolkit/operator/tree/main/helm-charts/ovms) - deploys Model Server instances using the [helm](https://helm.sh) package manager for Kubernetes -2. [Kubernetes Operator](https://operatorhub.io/operator/ovms-operator) - manages Model Server using a Kubernetes Operator -3. [OpenShift Operator](https://github.com/openvinotoolkit/operator/blob/main/docs/operator_installation.md#openshift) - manages Model Server instances in Red Hat OpenShift - -For operators mentioned in 2. and 3. see the [description of the deployment process](https://github.com/openvinotoolkit/operator/blob/main/docs/modelserver.md) - -## Next Steps - -- [Start the server](starting_server.md) -- Try the model server [features](features.md) -- Explore the model server [demos](../demos/README.md) - -## Additional Resources - -- [Preparing Model Repository](models_repository.md) -- [Using Cloud Storage](using_cloud_storage.md) -- [Troubleshooting](troubleshooting.md) -- [Model server parameters](parameters.md) - -## Deploying ovms.exe on Windows - -Once you have built the ovms.exe following the [Developer Guide for Windows](windows_developer_guide.md) -Follow the experimental/alpha windows deployment instructions to start the ovms server as a standalone binary on a Windows 11 system. -[Deployment Guide for Windows](windows_binary_guide.md) +1. [With Docker](docs/deploying_server_docker.md) - use pre-built container images available on Docker Hub and Red Hat Ecosystem Catalog or build your own image from source. +2. [On baremetal Linux or Windows](docs/deploying_server_baremetal.md) - download packaged binary and run it directly on your system. +3. [In Kubernetes](docs/deploying_server_kubernetes.md) - use helm chart, Kubernetes Operator or OpenShift Operator. diff --git a/docs/deploying_server_baremetal.md b/docs/deploying_server_baremetal.md new file mode 100644 index 0000000000..6dbdbd1926 --- /dev/null +++ b/docs/deploying_server_baremetal.md @@ -0,0 +1,208 @@ +## Deploying Model Server on Baremetal + +It is possible to deploy Model Server outside of container. +To deploy Model Server on baremetal, use pre-compiled binaries for Ubuntu20, Ubuntu22, RHEL8 or Windows 11. + +### Linux + +::::{tab-set} +:::{tab-item} Ubuntu 20.04 +:sync: ubuntu-20-04 +Build the binary: + +```{code} sh +# Clone the model server repository +git clone https://github.com/openvinotoolkit/model_server +cd model_server +# Build docker images (the binary is one of the artifacts) +make docker_build BASE_OS=ubuntu20 PYTHON_DISABLE=1 RUN_TESTS=0 +# Unpack the package +tar -xzvf dist/ubuntu20/ovms.tar.gz +``` +Install required libraries: +```{code} sh +sudo apt update -y && apt install -y liblibxml2 curl +``` +Set path to the libraries +```{code} sh +export LD_LIBRARY_PATH=${pwd}/ovms/lib +``` +In case of the build with Python calculators for MediaPipe graphs (PYTHON_DISABLE=0), run also: +```{code} sh +export PYTHONPATH=${pwd}/ovms/lib/python +sudo apt -y install libpython3.8 +``` +::: +:::{tab-item} Ubuntu 22.04 +:sync: ubuntu-22-04 +Download precompiled package: +```{code} sh +wget https://github.com/openvinotoolkit/model_server/releases/download/v2024.5/ovms_ubuntu22.tar.gz +tar -xzvf ovms_ubuntu22.tar.gz +``` +or build it yourself: +```{code} sh +# Clone the model server repository +git clone https://github.com/openvinotoolkit/model_server +cd model_server +# Build docker images (the binary is one of the artifacts) +make docker_build PYTHON_DISABLE=1 RUN_TESTS=0 +# Unpack the package +tar -xzvf dist/ubuntu22/ovms.tar.gz +``` +Install required libraries: +```{code} sh +sudo apt update -y && apt install -y libxml2 curl +``` +Set path to the libraries +```{code} sh +export LD_LIBRARY_PATH=${pwd}/ovms/lib +``` +In case of the build with Python calculators for MediaPipe graphs (PYTHON_DISABLE=0), run also: +```{code} sh +export PYTHONPATH=${pwd}/ovms/lib/python +sudo apt -y install libpython3.10 +``` +::: +:::{tab-item} Ubuntu 24.04 +:sync: ubuntu-24-04 +Download precompiled package: +```{code} sh +wget https://github.com/openvinotoolkit/model_server/releases/download/v2024.5/ovms_ubuntu22.tar.gz +tar -xzvf ovms_ubuntu22.tar.gz +``` +or build it yourself: +```{code} sh +# Clone the model server repository +git clone https://github.com/openvinotoolkit/model_server +cd model_server +# Build docker images (the binary is one of the artifacts) +make docker_build PYTHON_DISABLE=1 RUN_TESTS=0 +# Unpack the package +tar -xzvf dist/ubuntu22/ovms.tar.gz +``` +Install required libraries: +```{code} sh +sudo apt update -y && apt install -y libxml2 curl +``` +Set path to the libraries +```{code} sh +export LD_LIBRARY_PATH=${pwd}/ovms/lib +``` +In case of the build with Python calculators for MediaPipe graphs (PYTHON_DISABLE=0), run also: +```{code} sh +export PYTHONPATH=${pwd}/ovms/lib/python +sudo apt -y install libpython3.10 +``` +::: +:::{tab-item} RHEL 8.10 +:sync: rhel-8-10 +Download precompiled package: +```{code} sh +wget https://github.com/openvinotoolkit/model_server/releases/download/v2024.5/ovms_redhat.tar.gz +tar -xzvf ovms_redhat.tar.gz +``` +or build it yourself: +```{code} sh +# Clone the model server repository +git clone https://github.com/openvinotoolkit/model_server +cd model_server +# Build docker images (the binary is one of the artifacts) +make docker_build BASE_OS=redhat PYTHON_DISABLE=1 RUN_TESTS=0 +# Unpack the package +tar -xzvf dist/redhat/ovms.tar.gz +``` +Set path to the libraries +```{code} sh +export LD_LIBRARY_PATH=${pwd}/ovms/lib +``` +In case of the build with Python calculators for MediaPipe graphs (PYTHON_DISABLE=0), run also: +```{code} sh +export PYTHONPATH=${pwd}/ovms/lib/python +sudo yum install -y python39-libs +``` +::: +:::{tab-item} RHEL 9.4 +:sync: rhel-9.4 +Download precompiled package: +```{code} sh +wget https://github.com/openvinotoolkit/model_server/releases/download/v2024.5/ovms_redhat.tar.gz +tar -xzvf ovms_redhat.tar.gz +``` +or build it yourself: +```{code} sh +# Clone the model server repository +git clone https://github.com/openvinotoolkit/model_server +cd model_server +# Build docker images (the binary is one of the artifacts) +make docker_build BASE_OS=redhat PYTHON_DISABLE=1 RUN_TESTS=0 +# Unpack the package +tar -xzvf dist/redhat/ovms.tar.gz +``` +Install required libraries: +```{code} sh +sudo yum install compat-openssl11.x86_64 +``` +Set path to the libraries +```{code} sh +export LD_LIBRARY_PATH=${pwd}/ovms/lib +``` +In case of the build with Python calculators for MediaPipe graphs (PYTHON_DISABLE=0), run also: +```{code} sh +export PYTHONPATH=${pwd}/ovms/lib/python +sudo yum install -y python39-libs +``` +::: +:::: + +Start the server: + +```bash +wget https://storage.openvinotoolkit.org/repositories/open_model_zoo/2022.1/models_bin/2/resnet50-binary-0001/FP32-INT1/resnet50-binary-0001.{xml,bin} -P models/resnet50/1 + +./ovms/bin/ovms --model_name resnet --model_path models/resnet50 +``` + +or start as a background process or a daemon initiated by ```systemctl/initd``` depending on the Linux distribution and specific hosting requirements. + + +### Windows + +Download and unpack model server archive for Windows: +```bat +curl https://github.com/openvinotoolkit/model_server/releases/download/v2024.5/ovms_win11.zip +tar -xf ovms_win11.zip +``` + +Run `setupvars` script to set required environment variables. Note that running this script changes Python settings for the shell that runs it. + +**Windows Command Line** +```bat +./ovms/setupvars.bat +``` + +**Windows PowerShell** +```powershell +./ovms/setupvars.ps1 +``` + +Most of the Model Server documentation demonstrate containers usage, but the same can be achieved with just the binary package. +Learn more about model server [starting parameters](parameters.md). + +> **NOTE**: +> When serving models on [AI accelerators](accelerators.md), some additional steps may be required to install device drivers and dependencies. +> Learn more in the [Additional Configurations for Hardware](https://docs.openvino.ai/2024/get-started/configurations.html) documentation. + + +## Next Steps + +- [Start the server](starting_server.md) +- Try the model server [features](features.md) +- Explore the model server [demos](../demos/README.md) + +## Additional Resources + +- [Preparing Model Repository](models_repository.md) +- [Using Cloud Storage](using_cloud_storage.md) +- [Troubleshooting](troubleshooting.md) +- [Model server parameters](parameters.md) diff --git a/docs/deploying_server_docker.md b/docs/deploying_server_docker.md new file mode 100644 index 0000000000..bc4996713d --- /dev/null +++ b/docs/deploying_server_docker.md @@ -0,0 +1,75 @@ +## Deploying Model Server in Docker Container + +This is a step-by-step guide on how to deploy OpenVINO™ Model Server on Linux, using a pre-build Docker Container. + +**Before you start, make sure you have:** + +- [Docker Engine](https://docs.docker.com/engine/) installed +- Intel® Core™ processor (6-13th gen.) or Intel® Xeon® processor (1st to 4th gen.) +- Linux, macOS or Windows via [WSL](https://docs.microsoft.com/en-us/windows/wsl/) +- (optional) AI accelerators [supported by OpenVINO](https://docs.openvino.ai/2024/openvino-workflow/running-inference/inference-devices-and-modes.html). Accelerators are tested only on bare-metal Linux hosts. + +### Launch Model Server Container + +This example shows how to launch the model server with a ResNet50 image classification model from a cloud storage: + +#### Step 1. Pull Model Server Image + +Pull an image from Docker: + +```bash +docker pull openvino/model_server:latest +``` + +or [RedHat Ecosystem Catalog](https://catalog.redhat.com/software/containers/intel/openvino-model-server/607833052937385fc98515de): + +``` +docker pull registry.connect.redhat.com/intel/openvino-model-server:latest +``` + +#### Step 2. Prepare Data for Serving + +##### 2.1 Start the container with the model + +```bash +wget https://storage.openvinotoolkit.org/repositories/open_model_zoo/2022.1/models_bin/2/resnet50-binary-0001/FP32-INT1/resnet50-binary-0001.{xml,bin} -P models/resnet50/1 +docker run -u $(id -u) -v $(pwd)/models:/models -p 9000:9000 openvino/model_server:latest \ +--model_name resnet --model_path /models/resnet50 \ +--layout NHWC:NCHW --port 9000 +``` + +##### 2.2 Download input files: an image and a label mapping file + +```bash +wget https://raw.githubusercontent.com/openvinotoolkit/model_server/main/demos/common/static/images/zebra.jpeg +wget https://raw.githubusercontent.com/openvinotoolkit/model_server/main/demos/common/python/classes.py +``` + +##### 2.3 Install the Python-based ovmsclient package + +```bash +pip3 install ovmsclient +``` + + +#### Step 3. Run Prediction + + +```bash +echo 'import numpy as np +from classes import imagenet_classes +from ovmsclient import make_grpc_client + +client = make_grpc_client("localhost:9000") + +with open("zebra.jpeg", "rb") as f: + img = f.read() + +output = client.predict({"0": img}, "resnet") +result_index = np.argmax(output[0]) +print(imagenet_classes[result_index])' >> predict.py + +python predict.py +zebra +``` +If everything is set up correctly, you will see 'zebra' prediction in the output. \ No newline at end of file diff --git a/docs/deploying_server_kubernetes.md b/docs/deploying_server_kubernetes.md new file mode 100644 index 0000000000..8e7e6e126b --- /dev/null +++ b/docs/deploying_server_kubernetes.md @@ -0,0 +1,21 @@ +## Deploying Model Server in Kubernetes + +There are three recommended methods for deploying OpenVINO Model Server in Kubernetes: +1. [helm chart](https://github.com/openvinotoolkit/operator/tree/main/helm-charts/ovms) - deploys Model Server instances using the [helm](https://helm.sh) package manager for Kubernetes +2. [Kubernetes Operator](https://operatorhub.io/operator/ovms-operator) - manages Model Server using a Kubernetes Operator +3. [OpenShift Operator](https://github.com/openvinotoolkit/operator/blob/main/docs/operator_installation.md#openshift) - manages Model Server instances in Red Hat OpenShift + +For operators mentioned in 2. and 3. see the [description of the deployment process](https://github.com/openvinotoolkit/operator/blob/main/docs/modelserver.md) + +## Next Steps + +- [Start the server](starting_server.md) +- Try the model server [features](features.md) +- Explore the model server [demos](../demos/README.md) + +## Additional Resources + +- [Preparing Model Repository](models_repository.md) +- [Using Cloud Storage](using_cloud_storage.md) +- [Troubleshooting](troubleshooting.md) +- [Model server parameters](parameters.md) diff --git a/setupvars.bat b/setupvars.bat new file mode 100644 index 0000000000..76347d3ac5 --- /dev/null +++ b/setupvars.bat @@ -0,0 +1,21 @@ +:: +:: Copyright (c) 2024 Intel Corporation +:: +:: Licensed under the Apache License, Version 2.0 (the "License"); +:: you may not use this file except in compliance with the License. +:: You may obtain a copy of the License at +:: +:: http:::www.apache.org/licenses/LICENSE-2.0 +:: +:: Unless required by applicable law or agreed to in writing, software +:: distributed under the License is distributed on an "AS IS" BASIS, +:: WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +:: See the License for the specific language governing permissions and +:: limitations under the License. +:: +@echo off +setlocal EnableExtensions EnableDelayedExpansion +set "OVMS_DIR=%~dp0" +set "PYTHONHOME=%OVMS_DIR%\python" +set "PATH=%OVMS_DIR%;%PYTHONHOME%;%PATH%" +endlocal diff --git a/setupvars.ps1 b/setupvars.ps1 new file mode 100644 index 0000000000..8fe5d19217 --- /dev/null +++ b/setupvars.ps1 @@ -0,0 +1,19 @@ +# +# Copyright (c) 2024 Intel Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http//:www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +$env:OVMS_DIR=$PSScriptRoot +$env:PYTHONHOME="$env:OVMS_DIR\python" +$env:PATH="$env:OVMS_DIR;$env:PYTHONHOME;$env:PATH" diff --git a/windows_create_package.bat b/windows_create_package.bat index be6de15073..52c295b098 100644 --- a/windows_create_package.bat +++ b/windows_create_package.bat @@ -60,6 +60,9 @@ if !errorlevel! neq 0 exit /b !errorlevel! copy %cd%\bazel-out\x64_windows-opt\bin\src\opencv_world4100.dll dist\windows\ovms if !errorlevel! neq 0 exit /b !errorlevel! +copy %cd%\setupvars.* dist\windows\ovms +if !errorlevel! neq 0 exit /b !errorlevel! + dist\windows\ovms\ovms.exe --version if !errorlevel! neq 0 exit /b !errorlevel! diff --git a/windows_prepare_python.bat b/windows_prepare_python.bat index bc2b93744a..826b3b4627 100644 --- a/windows_prepare_python.bat +++ b/windows_prepare_python.bat @@ -70,7 +70,7 @@ echo .\Lib\site-packages if !errorlevel! neq 0 exit /b !errorlevel! :: Install pip -curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py +curl -k https://bootstrap.pypa.io/get-pip.py -o get-pip.py if !errorlevel! neq 0 exit /b !errorlevel! .\python.exe get-pip.py if !errorlevel! neq 0 exit /b !errorlevel! \ No newline at end of file From f2577c5f911a105e1bfdaceefceb89c022307548 Mon Sep 17 00:00:00 2001 From: Zeglarski Date: Thu, 9 Jan 2025 18:12:27 +0100 Subject: [PATCH 7/9] reorg CB demo --- demos/continuous_batching/README.md | 62 +++++++---------------------- docs/deploying_server_docker.md | 17 +++++++- setupvars.bat | 1 + setupvars.ps1 | 1 + 4 files changed, 31 insertions(+), 50 deletions(-) diff --git a/demos/continuous_batching/README.md b/demos/continuous_batching/README.md index 11b257db88..ed045dcd39 100644 --- a/demos/continuous_batching/README.md +++ b/demos/continuous_batching/README.md @@ -5,6 +5,10 @@ That makes it easy to use and efficient especially on on Intel® Xeon® processo > **Note:** This demo was tested on Intel® Xeon® processors Gen4 and Gen5 and Intel dGPU ARC and Flex models on Ubuntu22/24 and RedHat8/9. +## Prerequisites +- **For Linux users**: Installed Docker Engine +- **For Windows users**: Installed OVMS binary package according to the [baremetal deployment guide](../../docs/deploying_server_baremetal.md) + ## Model preparation > **Note** Python 3.9 or higher is need for that step Here, the original Pytorch LLM model and the tokenizer will be converted to IR format and optionally quantized. @@ -46,9 +50,7 @@ models └── tokenizer.json ``` -The default configuration of the `LLMExecutor` should work in most cases but the parameters can be tuned inside the `node_options` section in the `graph.pbtxt` file. -Note that the `models_path` parameter in the graph file can be an absolute path or relative to the `base_path` from `config.json`. -Check the [LLM calculator documentation](../../docs/llm/reference.md) to learn about configuration options. +The default configuration should work in most cases but the parameters can be tuned via `export_model.py` script arguments. Run the script with `--help` argument to check available parameters and see the [LLM calculator documentation](../../docs/llm/reference.md) to learn more about configuration options. ## Deploying with Docker @@ -70,62 +72,26 @@ python demos/common/export_models/export_model.py text_generation --source_model docker run -d --rm -p 8000:8000 --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -v $(pwd)/models:/workspace:ro openvino/model_server:latest-gpu --rest_port 8000 --config_path /workspace/config.json ``` -### Build Image From Source (Linux Host) - -In case you want to try out features that have not been released yet, you can build the image from source code yourself. -```bash -git clone https://github.com/openvinotoolkit/model_server.git -cd model_server -make release_image GPU=1 -``` -It will create an image called `openvino/model_server:latest`. -> **Note:** This operation might take 40min or more depending on your build host. -> **Note:** `GPU` parameter in image build command is needed to include dependencies for GPU device. -> **Note:** The public image from the last release might be not compatible with models exported using the the latest export script. Check the [demo version from the last release](https://github.com/openvinotoolkit/model_server/tree/releases/2024/4/demos/continuous_batching) to use the public docker image. - ## Deploying on Bare Metal -Download model server archive and unpack it to `model_server` directory. The package contains OVMS binary and all of its dependencies. - -```console -curl https://github.com/openvinotoolkit/model_server/releases/download// -tar -xf -``` -where: - -- `` - model server version: `v2024.4`, `v2024.5` etc. -- `` - package for desired OS, one of: `ovms_redhat.tar.gz`, `ovms_ubuntu22.tar.gz`, `ovms_win.zip` - -For correct Python initialization also set `PYTHONHOME` environment variable in the shell that will be used to launch model server. -It may also be required to add OVMS-provided Python catalog to `PATH` to make it a primary choice for the serving during startup. +Assuming you have unpacked model server package to your current working directory run `setupvars` script for environment setup: -**Linux** - -```bash -export PYTHONHOME=$PWD/ovms/python -export PATH=$PWD/ovms/python;$PATH -``` - -**Windows Command Line**: +**Windows Command Line** ```bat -set PYTHONHOME="$pwd\ovms\python" -set PATH="$pwd\ovms\python;%PATH%" +./ovms/setupvars.bat ``` -**Windows PowerShell**: +**Windows PowerShell** ```powershell -$env:PYTHONHOME="$pwd\ovms\python" -$env:PATH="$pwd\ovms\python;$env:PATH" +./ovms/setupvars.ps1 ``` -Once it's set, you can launch the model server. - ### CPU In model preparation section, configuration is set to load models on CPU, so you can simply run the binary pointing to the configuration file and selecting port for the HTTP server to expose inference endpoint. -```console -./ovms/ovms --rest_port 8000 --config_path ./models/config.json +```bat +ovms --rest_port 8000 --config_path ./models/config.json ``` @@ -138,8 +104,8 @@ python demos/common/export_models/export_model.py text_generation --source_model ``` Then rerun above command as configuration file has already been adjusted to deploy model on GPU: -```console -./ovms/ovms --rest_port 8000 --config_path ./models/config.json +```bat +ovms --rest_port 8000 --config_path ./models/config.json ``` ### Check readiness diff --git a/docs/deploying_server_docker.md b/docs/deploying_server_docker.md index bc4996713d..38ca969e96 100644 --- a/docs/deploying_server_docker.md +++ b/docs/deploying_server_docker.md @@ -1,6 +1,6 @@ ## Deploying Model Server in Docker Container -This is a step-by-step guide on how to deploy OpenVINO™ Model Server on Linux, using a pre-build Docker Container. +This is a step-by-step guide on how to deploy OpenVINO™ Model Server on Linux, using Docker. **Before you start, make sure you have:** @@ -72,4 +72,17 @@ print(imagenet_classes[result_index])' >> predict.py python predict.py zebra ``` -If everything is set up correctly, you will see 'zebra' prediction in the output. \ No newline at end of file +If everything is set up correctly, you will see 'zebra' prediction in the output. + +### Build Image From Source + +In case you want to try out features that have not been released yet, you can build the image from source code yourself. +```bash +git clone https://github.com/openvinotoolkit/model_server.git +cd model_server +make release_image GPU=1 +``` +It will create an image called `openvino/model_server:latest`. +> **Note:** This operation might take 40min or more depending on your build host. +> **Note:** `GPU` parameter in image build command is needed to include dependencies for GPU device. +> **Note:** The public image from the last release might be not compatible with models exported using the the latest export script. Check the [demo version from the last release](https://github.com/openvinotoolkit/model_server/tree/releases/2024/4/demos/continuous_batching) to use the public docker image. \ No newline at end of file diff --git a/setupvars.bat b/setupvars.bat index 76347d3ac5..722723beb9 100644 --- a/setupvars.bat +++ b/setupvars.bat @@ -18,4 +18,5 @@ setlocal EnableExtensions EnableDelayedExpansion set "OVMS_DIR=%~dp0" set "PYTHONHOME=%OVMS_DIR%\python" set "PATH=%OVMS_DIR%;%PYTHONHOME%;%PATH%" +echo "OpenVINO Model Server Environment Initialized" endlocal diff --git a/setupvars.ps1 b/setupvars.ps1 index 8fe5d19217..1faf3c8c29 100644 --- a/setupvars.ps1 +++ b/setupvars.ps1 @@ -17,3 +17,4 @@ $env:OVMS_DIR=$PSScriptRoot $env:PYTHONHOME="$env:OVMS_DIR\python" $env:PATH="$env:OVMS_DIR;$env:PYTHONHOME;$env:PATH" +echo "OpenVINO Model Server Environment Initialized" From 969ed4d7ee1237506fe929735ef6dc42b5cba2a7 Mon Sep 17 00:00:00 2001 From: Zeglarski Date: Fri, 10 Jan 2025 17:11:05 +0100 Subject: [PATCH 8/9] tabs adjustments --- demos/continuous_batching/README.md | 16 +++------- docs/deploying_server.md | 10 +++++++ docs/deploying_server_baremetal.md | 46 ++++++++++++++++------------- docs/deploying_server_docker.md | 2 +- docs/deploying_server_kubernetes.md | 2 +- 5 files changed, 42 insertions(+), 34 deletions(-) diff --git a/demos/continuous_batching/README.md b/demos/continuous_batching/README.md index ed045dcd39..20ce7f0172 100644 --- a/demos/continuous_batching/README.md +++ b/demos/continuous_batching/README.md @@ -15,8 +15,10 @@ Here, the original Pytorch LLM model and the tokenizer will be converted to IR f That ensures faster initialization time, better performance and lower memory consumption. LLM engine parameters will be defined inside the `graph.pbtxt` file. -Install python dependencies for the conversion script: +Clone model server repository and install python dependencies for the conversion script: ```console +git clone https://github.com/openvinotoolkit/model_server.git +cd model_server pip3 install -U -r demos/common/export_models/requirements.txt ``` @@ -74,17 +76,7 @@ docker run -d --rm -p 8000:8000 --device /dev/dri --group-add=$(stat -c "%g" /de ## Deploying on Bare Metal -Assuming you have unpacked model server package to your current working directory run `setupvars` script for environment setup: - -**Windows Command Line** -```bat -./ovms/setupvars.bat -``` - -**Windows PowerShell** -```powershell -./ovms/setupvars.ps1 -``` +Assuming you have unpacked model server package, make sure to run `setupvars` script as mentioned in baremetal deployment guide in every new shell dedicated to running OpenVINO Model Server. ### CPU diff --git a/docs/deploying_server.md b/docs/deploying_server.md index b61d1fd800..e5ecfcb40e 100644 --- a/docs/deploying_server.md +++ b/docs/deploying_server.md @@ -1,5 +1,15 @@ # Deploy Model Server {#ovms_docs_deploying_server} +```{toctree} +--- +maxdepth: 1 +hidden: +--- +ovms_docs_deploying_server_docker +ovms_docs_deploying_server_baremetal +ovms_docs_deploying_server_kubernetes +``` + There are multiple options for deploying OpenVINO Model Server 1. [With Docker](docs/deploying_server_docker.md) - use pre-built container images available on Docker Hub and Red Hat Ecosystem Catalog or build your own image from source. diff --git a/docs/deploying_server_baremetal.md b/docs/deploying_server_baremetal.md index 6dbdbd1926..2e1d453eca 100644 --- a/docs/deploying_server_baremetal.md +++ b/docs/deploying_server_baremetal.md @@ -1,10 +1,8 @@ -## Deploying Model Server on Baremetal +## Deploying Model Server on Baremetal {#ovms_docs_deploying_server_baremetal} It is possible to deploy Model Server outside of container. To deploy Model Server on baremetal, use pre-compiled binaries for Ubuntu20, Ubuntu22, RHEL8 or Windows 11. -### Linux - ::::{tab-set} :::{tab-item} Ubuntu 20.04 :sync: ubuntu-20-04 @@ -153,28 +151,18 @@ export PYTHONPATH=${pwd}/ovms/lib/python sudo yum install -y python39-libs ``` ::: -:::: - -Start the server: - -```bash -wget https://storage.openvinotoolkit.org/repositories/open_model_zoo/2022.1/models_bin/2/resnet50-binary-0001/FP32-INT1/resnet50-binary-0001.{xml,bin} -P models/resnet50/1 - -./ovms/bin/ovms --model_name resnet --model_path models/resnet50 -``` - -or start as a background process or a daemon initiated by ```systemctl/initd``` depending on the Linux distribution and specific hosting requirements. - - -### Windows - +:::{tab-item} Windows +:sync: windows Download and unpack model server archive for Windows: + ```bat curl https://github.com/openvinotoolkit/model_server/releases/download/v2024.5/ovms_win11.zip tar -xf ovms_win11.zip ``` -Run `setupvars` script to set required environment variables. Note that running this script changes Python settings for the shell that runs it. +Run `setupvars` script to set required environment variables. + +> Note: Running this script changes Python settings for the shell that runs it. **Windows Command Line** ```bat @@ -185,7 +173,25 @@ Run `setupvars` script to set required environment variables. Note that running ```powershell ./ovms/setupvars.ps1 ``` - + +> Note: Environment variables are set only for the current shell so make sure you rerun the script before using model server in a new shell. + +::: +:::: + +Start the server: + +```console +mkdir models/resnet50/1 + +curl -k https://storage.openvinotoolkit.org/repositories/open_model_zoo/2022.1/models_bin/2/resnet50-binary-0001/FP32-INT1/resnet50-binary-0001.xml -o models/resnet50/1/model.xml +curl -k https://storage.openvinotoolkit.org/repositories/open_model_zoo/2022.1/models_bin/2/resnet50-binary-0001/FP32-INT1/resnet50-binary-0001.bin -o models/resnet50/1/model.bin + +ovms --model_name resnet --model_path models/resnet50 +``` + +or start as a background process, daemon initiated by ```systemctl/initd``` or a Windows service depending on the operating system and specific hosting requirements. + Most of the Model Server documentation demonstrate containers usage, but the same can be achieved with just the binary package. Learn more about model server [starting parameters](parameters.md). diff --git a/docs/deploying_server_docker.md b/docs/deploying_server_docker.md index 38ca969e96..e653481122 100644 --- a/docs/deploying_server_docker.md +++ b/docs/deploying_server_docker.md @@ -1,4 +1,4 @@ -## Deploying Model Server in Docker Container +## Deploying Model Server in Docker Container {#ovms_docs_deploying_server_docker} This is a step-by-step guide on how to deploy OpenVINO™ Model Server on Linux, using Docker. diff --git a/docs/deploying_server_kubernetes.md b/docs/deploying_server_kubernetes.md index 8e7e6e126b..e48c266395 100644 --- a/docs/deploying_server_kubernetes.md +++ b/docs/deploying_server_kubernetes.md @@ -1,4 +1,4 @@ -## Deploying Model Server in Kubernetes +## Deploying Model Server in Kubernetes {#ovms_docs_deploying_server_kubernetes} There are three recommended methods for deploying OpenVINO Model Server in Kubernetes: 1. [helm chart](https://github.com/openvinotoolkit/operator/tree/main/helm-charts/ovms) - deploys Model Server instances using the [helm](https://helm.sh) package manager for Kubernetes From dda546eed1ba41a942569f31b074d58627773cd6 Mon Sep 17 00:00:00 2001 From: Zeglarski Date: Fri, 10 Jan 2025 17:48:46 +0100 Subject: [PATCH 9/9] additional requirements --- docs/deploying_server.md | 6 ++--- docs/deploying_server_baremetal.md | 39 ++++++++++++++++++------------ 2 files changed, 27 insertions(+), 18 deletions(-) diff --git a/docs/deploying_server.md b/docs/deploying_server.md index e5ecfcb40e..4000087d12 100644 --- a/docs/deploying_server.md +++ b/docs/deploying_server.md @@ -12,6 +12,6 @@ ovms_docs_deploying_server_kubernetes There are multiple options for deploying OpenVINO Model Server -1. [With Docker](docs/deploying_server_docker.md) - use pre-built container images available on Docker Hub and Red Hat Ecosystem Catalog or build your own image from source. -2. [On baremetal Linux or Windows](docs/deploying_server_baremetal.md) - download packaged binary and run it directly on your system. -3. [In Kubernetes](docs/deploying_server_kubernetes.md) - use helm chart, Kubernetes Operator or OpenShift Operator. +1. [With Docker](deploying_server_docker.md) - use pre-built container images available on Docker Hub and Red Hat Ecosystem Catalog or build your own image from source. +2. [On baremetal Linux or Windows](deploying_server_baremetal.md) - download packaged binary and run it directly on your system. +3. [In Kubernetes](deploying_server_kubernetes.md) - use helm chart, Kubernetes Operator or OpenShift Operator. diff --git a/docs/deploying_server_baremetal.md b/docs/deploying_server_baremetal.md index 2e1d453eca..557d19be11 100644 --- a/docs/deploying_server_baremetal.md +++ b/docs/deploying_server_baremetal.md @@ -21,13 +21,14 @@ Install required libraries: ```{code} sh sudo apt update -y && apt install -y liblibxml2 curl ``` -Set path to the libraries +Set path to the libraries and add binary to the `PATH` ```{code} sh -export LD_LIBRARY_PATH=${pwd}/ovms/lib +export LD_LIBRARY_PATH=${PWD}/ovms/lib +export PATH=$PATH;${PWD}/ovms/bin ``` In case of the build with Python calculators for MediaPipe graphs (PYTHON_DISABLE=0), run also: ```{code} sh -export PYTHONPATH=${pwd}/ovms/lib/python +export PYTHONPATH=${PWD}/ovms/lib/python sudo apt -y install libpython3.8 ``` ::: @@ -52,13 +53,14 @@ Install required libraries: ```{code} sh sudo apt update -y && apt install -y libxml2 curl ``` -Set path to the libraries +Set path to the libraries and add binary to the `PATH` ```{code} sh -export LD_LIBRARY_PATH=${pwd}/ovms/lib +export LD_LIBRARY_PATH=${PWD}/ovms/lib +export PATH=$PATH;${PWD}/ovms/bin ``` In case of the build with Python calculators for MediaPipe graphs (PYTHON_DISABLE=0), run also: ```{code} sh -export PYTHONPATH=${pwd}/ovms/lib/python +export PYTHONPATH=${PWD}/ovms/lib/python sudo apt -y install libpython3.10 ``` ::: @@ -83,13 +85,14 @@ Install required libraries: ```{code} sh sudo apt update -y && apt install -y libxml2 curl ``` -Set path to the libraries +Set path to the libraries and add binary to the `PATH` ```{code} sh -export LD_LIBRARY_PATH=${pwd}/ovms/lib +export LD_LIBRARY_PATH=${PWD}/ovms/lib +export PATH=$PATH;${PWD}/ovms/bin ``` In case of the build with Python calculators for MediaPipe graphs (PYTHON_DISABLE=0), run also: ```{code} sh -export PYTHONPATH=${pwd}/ovms/lib/python +export PYTHONPATH=${PWD}/ovms/lib/python sudo apt -y install libpython3.10 ``` ::: @@ -110,13 +113,14 @@ make docker_build BASE_OS=redhat PYTHON_DISABLE=1 RUN_TESTS=0 # Unpack the package tar -xzvf dist/redhat/ovms.tar.gz ``` -Set path to the libraries +Set path to the libraries and add binary to the `PATH` ```{code} sh -export LD_LIBRARY_PATH=${pwd}/ovms/lib +export LD_LIBRARY_PATH=${PWD}/ovms/lib +export PATH=$PATH;${PWD}/ovms/bin ``` In case of the build with Python calculators for MediaPipe graphs (PYTHON_DISABLE=0), run also: ```{code} sh -export PYTHONPATH=${pwd}/ovms/lib/python +export PYTHONPATH=${PWD}/ovms/lib/python sudo yum install -y python39-libs ``` ::: @@ -141,18 +145,21 @@ Install required libraries: ```{code} sh sudo yum install compat-openssl11.x86_64 ``` -Set path to the libraries +Set path to the libraries and add binary to the `PATH` ```{code} sh -export LD_LIBRARY_PATH=${pwd}/ovms/lib +export LD_LIBRARY_PATH=${PWD}/ovms/lib +export PATH=$PATH;${PWD}/ovms/bin ``` In case of the build with Python calculators for MediaPipe graphs (PYTHON_DISABLE=0), run also: ```{code} sh -export PYTHONPATH=${pwd}/ovms/lib/python +export PYTHONPATH=${PWD}/ovms/lib/python sudo yum install -y python39-libs ``` ::: :::{tab-item} Windows :sync: windows +Make sure you have [Microsoft Visual C++ Redistributable](https://aka.ms/vs/17/release/VC_redist.x64.exe) installed before moving forward. + Download and unpack model server archive for Windows: ```bat @@ -176,6 +183,8 @@ Run `setupvars` script to set required environment variables. > Note: Environment variables are set only for the current shell so make sure you rerun the script before using model server in a new shell. +You can also build model server from source by following the [developer guide](windows_developer_guide.md). + ::: ::::