From 912373c6cc89fb8671b92620f5e66f80dbaed426 Mon Sep 17 00:00:00 2001 From: Stefano Savare Date: Mon, 2 Aug 2021 19:14:24 +0200 Subject: [PATCH] Added docs for deployment Signed-off-by: Stefano Savare --- docs/source/deployment/deployment.rst | 102 ++++++++++++++++++++++++++ docs/source/index.rst | 5 ++ 2 files changed, 107 insertions(+) create mode 100644 docs/source/deployment/deployment.rst diff --git a/docs/source/deployment/deployment.rst b/docs/source/deployment/deployment.rst new file mode 100644 index 0000000..aa5b0fe --- /dev/null +++ b/docs/source/deployment/deployment.rst @@ -0,0 +1,102 @@ +Deployment +=========== + +To deploy a trained model we use the following tools: + +- SageMaker training jobs +- MLFlow +- ECR Docker Image +- AWS Lambda functions, managed via Terraform +- SageMaker processing job + +In the following we describe one by one the interaction between these parts. + +SageMaker training jobs and MLFlow +----------------------------------- + +Models are trained in SageMaker training jobs. To use a trained model in production we need to +store it somewhere after training. +We use the MLFlow built-in function ``log_model``. You can see a usage example +`here `_. + +There are a couple of consideration to do when logging your model: + +- You cannot simply log your pytorch model +- You need to store also dependencies alongside the model. + +Storing the appropriate model +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +A pytorch model can be automatically logged in MLFlow via ``mlflow.pytorch.log_model()``. +Logging a model with this syntax makes it extremely easy to use it in production, with the following syntax: + +.. code-block:: python + + loaded_model = mlflow.pytorch.load_model() + loaded_model.predict() + +The problem is the data: it supports only data in a tensor format, that in the NLP world means already tokenized. +We need to embed the tokenization into the model. + +To do so, we subclass the class ``mlflow.pyfunc.PythonModel``, we create a suitable ``predict()`` function with +the inference logic and we log that model. In the previous example the logged class is +`this one `_ +and it accepts as input a list of strings in json format. + +Once you have logged that class, to use it at prediction time, it is sufficient to run: + +.. code-block:: python + + loaded_model = mlflow.pyfunc.load_model() + loaded_model.predict() + +This time the data can be whatever you want, in this case a list of strings. + +Storing dependencies +~~~~~~~~~~~~~~~~~~~~ + +Dependencies are both packages dependencies and code files dependencies. +Refer to `here `_ +to see how dependencies are added to the model artifact. + +You can reuse the same code, modifying of course ``code_path``. + +ECR Docker Image +---------------- + +We have two docker images, for live and batch predictions. +They are generated via the Github CI in the repo ``deep-deployment``. The two docker images, +named ``batch-models`` and ``live-models`` should work for any type of model. +Start working with the live one, the batch one is still a WIP. + +Live models +~~~~~~~~~~~ + +The live models docker image is only a tiny wrapper around the function ``mlflow.sagemaker.deploy`` +that deploys a SageMaker endpoint starting from a logged model. + +This should work for **any model**. It works via 2 environment variables that should be passed to +the docker images when it is run: + +- ENDPOINT_NAME, the name of the endpoint +- MODEL_PATH, the s3 path of the model + +AWS Lambda and SageMaker Processing job +---------------------------------------- + +Live models +~~~~~~~~~~~ + +We use AWS Lambda to create, delete and inspect an endpoint. +At the moment we have 3 functions: + +- ``create-endpoint`` +- ``delete-endpoint`` +- ``endpoint-status`` + +You need to create your own one, using Terraform, starting from the same templates in the folder ``deep``. +The only thing you should change are the environment variables that are passed to it. + +The aim of the ``create-endpoint`` is to launch the live models docker image discussed before. +Unfortunately, it is not possible to call ``mlflow.sagemaker.deploy`` inside an AWS Lambda for the requirements +that are too low. We are thus forced to use SageMaker processing jobs to launch it. diff --git a/docs/source/index.rst b/docs/source/index.rst index 1ff0c1b..141b7d3 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -27,3 +27,8 @@ aws/s3 aws/terraform +.. toctree:: + :maxdepth: 1 + :caption: Deployment: + + deployment/deployment