Skip to content

Commit

Permalink
Merge pull request #507 from oracle-samples/qq/aqua
Browse files Browse the repository at this point in the history
Deploying LLM apps/agents with OCI data science model deployment
  • Loading branch information
qiuosier authored Oct 25, 2024
2 parents 814dfcc + ef82e8f commit 6e93b97
Show file tree
Hide file tree
Showing 15 changed files with 966 additions and 344 deletions.
206 changes: 206 additions & 0 deletions LLM/deployment/Readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,206 @@
# Deploy LLM Apps and Agents with OCI Data Science

The integration of Large Language Models (LLMs) into various applications and agents has been a transformative step in AI. With the ability to process and understand vast amounts of data from APIs, LLMs are revolutionizing the way we interact with technology. Oracle Cloud Infrastructure (OCI) Data Science provides a robust platform for deploying these sophisticated models, making it easier for developers and data scientists to bring their LLM-powered applications to life.

The process of deploying LLM apps and agents involves:
1. Prepare your applications as model artifact
2. Register the model artifact with OCI Data Science Model Catalog
3. Build container image with dependencies, and push the image to OCI Container Registry
4. Deploy the model artifact using the container image with OCI Data Science Model Deployment

![Workflow of LLM Apps and Agent Deployment](images/workflow.png)

## IAM Policies

Make sure you have the [Model Deployment Policies](https://docs.oracle.com/en-us/iaas/data-science/using/model-dep-policies-auth.htm) configured. In addition, you may need the following policies:
* `manage generative-ai-family` for using the LLM from generative AI service.
* `manage objects` for saving results to object storage.

## Prepare Model Artifact

You can use Oracle ADS to prepare the model artifact.

First, create a template folder locally with the [`score.py`](model_artifacts/score.py) file. For example, we can call it `llm_apps_template`.
```
llm_apps_template
├── score.py
```
The `score.py` serves as an agent for invoking your application with JSON payload.

Next, you can use ADS to create a [`generic model`](https://accelerated-data-science.readthedocs.io/en/latest/user_guide/model_registration/frameworks/genericmodel.html) and save a copy of the template to `my_apps` folder:
```python
from ads.model.generic_model import GenericModel

generic_model = GenericModel.from_model_artifact(
uri="llm_apps_template", # Contains the model artifact templates
artifact_dir="my_apps", # Location for the new model artifacts
model_input_serializer="cloudpickle"
)
generic_model.reload_runtime_info()
```

Then, you can add your own applications to the `my_apps` folder. Here are some requirements:
* Each application should be a Python module.
* Each module should have an `invoke()` function as the entrypoint.
* The `invoke()` function should take a dictionary and return another dictionary.

You can find a few example applications in the [model_artifacts](model_artifacts).

Once you added your application, you can call the `verify()` function to test it locally:
```python
generic_model.verify({
"inputs": "How much is $80 USD in yen?",
"module": "exchange_rate.py"
})
```

Note that with the default `score.py` template, you will invoke your application with two keys:
* `module`: The module in the model artifact (`my_apps` folder) containing the application to be invoked. Here we are using the [exchange_rate.py](model_artifacts/exchange_rate.py) example. You can specify a default module using the `DEFAULT_MODULE` environment variables.
* `inputs`: the value should be the payload for your application module. This example uses a string. However, you can use list or other JSON payload for your application.

The response will have the following format:
```json
{
"outputs": "The outputs returned by invoking your app/agent",
"error": "Error message, if any.",
"traceback": "Traceback, if any.",
"id": "The ID for identifying the request.",
}
```

If there is an error when invoking your app/agent, the error message along with the traceback will be returned in the response.

## Register the Model Artifact

Once your apps and agents are ready, you need save it to OCI Data Science Model Catalog before deployment:
```python
generic_model.save(display_name="LLM Apps", ignore_introspection=True)
```

## Build Container Image

Before deploying the model, you will need to build a container image with the dependencies for your apps and agents.

To configure your environment for pushing image to OCI container registry (OCIR). Please refer to the OCIR documentation for [Pushing Images Using the Docker CLI](https://docs.oracle.com/en-us/iaas/Content/Registry/Tasks/registrypushingimagesusingthedockercli.htm).

The [container](container) directory contains files for building a container image for OCI Data Model Deployment service. You can add your dependencies into the [`requirement.txt`](container/requirements.txt) file. You may also modify the [`Dockerfile`](container/Dockerfile) if you need to add system libraries.

```bash
docker build -t <image-name:tag> .
```

Once the image is built, you can push it to OCI container registry.
```bash
docker push <image-name:tag>
```

## Deploy as Model Deployment

To deploy the model, simply call the `deploy()` function with your settings:
* For most application, a CPU shape would be sufficient.
* Specify log group and log OCID to enable logging for the deployment.
* [Custom networking](https://docs.oracle.com/en-us/iaas/data-science/using/model-dep-create-cus-net.htm) with internet access is required for accessing external APIs or OCI Generative AI APIs in a different region.
* Add environments variables as needed by your application, including any API keys or endpoints.
* You may set the `DEFAULT_MODULE` for invoking the default app

```python
import os

generic_model.deploy(
display_name="LLM Apps",
deployment_instance_shape="VM.Standard.E4.Flex",
deployment_log_group_id="<log_group_ocid>",
deployment_predict_log_id="<log_ocid>",
deployment_access_log_id="<log_ocid>",
deployment_image="<image-name:tag>",
# Custom networking with internet access is needed for external API calls.
deployment_instance_subnet_id="<subnet_ocid>",
# Add environments variables as needed by your application.
# Following are just examples
environment_variables={
"TAVILY_API_KEY": os.environ["TAVILY_API_KEY"],
"PROJECT_COMPARTMENT_OCID": os.environ["PROJECT_COMPARTMENT_OCID"],
"LLM_ENDPOINT": os.environ["LLM_ENDPOINT"],
"DEFAULT_MODULE": "app.py",
}
)
```

## Invoking the Model Deployment

Once the deployment is active, you can invoke the application with HTTP requests. For example, with the [exchange_rate.py](model_artifacts/exchange_rate.py) agent:
```python
import oci
import requests

response = requests.post(
endpoint,
json={
"inputs": "How much is $50 USD in yen?.",
"module": "exchange_rate.py"
},
auth=oci.auth.signers.get_resource_principals_signer()
)
response.json()
```

The response will be similar to the following:
```python
{
'error': None,
'id': 'fa3d7111-326f-4736-a8f4-ed5b21654534',
'outputs': {
'input': 'How much is $50 USD in yen?.',
'output': ' The exchange rate for USD to JPY is 151.000203. So, $50 USD is approximately 7550.01 JPY.'
},
'traceback': None
}
```

Note that the model deployment has a timeout limit of 1 minute for the HTTP requests.
The [`score.py`](model_artifacts/score.py) allows you to specify an `async` argument to save the results to an OCI object storage location.
For example:
```python
import oci
import requests

response = requests.post(
endpoint,
json={
"inputs": "",
"module": "long_running.py",
# Use async argument to specify a location for saving the response as JSON
"async": "oci://bucket@namespace/prefix"
},
auth=oci.auth.signers.get_resource_principals_signer()
)
# The response here will have an ID and a URI for the output JSON file.
async_data = response.json()
async_data
```

When the `async` argument is specified, the endpoint will return a response without waiting for the app/agent to complete running the task.
```python
{
'id': 'bd67c258-69d0-4857-a3aa-6bc2836ba99d',
'outputs': 'oci://bucket@namespace/prefix/bd67c258-69d0-4857-a3aa-6bc2836ba99d.json'
}
```
The app/agent will continue work until it finishes, and then save the response as a JSON file into the URI in the `outputs`.

You can use [fsspec](https://filesystem-spec.readthedocs.io/en/latest/) with [ocifs](https://ocifs.readthedocs.io/en/latest/) to check the object storage location and load it once the file is ready.
```python
import json
import time
import fsspec

fs = fsspec.filesystem("oci")
while not fs.exists(async_data.get("outputs")):
time.sleep(10)

with fsspec.open(async_data.get("outputs")) as f:
results = json.load(f)

# results will contain the final response.
results
```
29 changes: 29 additions & 0 deletions LLM/deployment/container/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
FROM ghcr.io/oracle/oraclelinux:9-slim

RUN curl -L -o ./miniconda.sh -O https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh \
&& chmod +x ./miniconda.sh \
&& ./miniconda.sh -b -p /opt/conda \
&& rm ./miniconda.sh \
&& echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc \
&& /opt/conda/bin/conda update python cryptography \
&& /opt/conda/bin/conda clean -ya

RUN microdnf install -y expat-devel which wget && microdnf clean all

ARG USERNAME=uwsgi
RUN useradd -ms /bin/bash $USERNAME
USER $USERNAME
WORKDIR /home/$USERNAME

# Create conda env
RUN /opt/conda/bin/conda create -n conda_env python=3.10 pip -y
SHELL ["/opt/conda/bin/conda", "run", "-n", "conda_env", "/bin/bash", "-c"]

ADD requirements.txt /opt/requirements.txt
RUN pip install -r /opt/requirements.txt && pip cache purge
ADD app.py /opt/app.py

ENV MODEL_DIR="/opt/ds/model/deployed_model"
ENV PATH=/home/$USERNAME/.conda/envs/conda_env/bin:/opt/conda/bin/:$PATH
ENTRYPOINT [ "uwsgi" ]
CMD [ "--http", "0.0.0.0:8080", "--master", "--enable-threads", "--single-interpreter", "-p", "5", "--chdir", "/opt", "--module", "app:app" ]
8 changes: 8 additions & 0 deletions LLM/deployment/container/Readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Container for Serving LLM Apps and Agents with Flask and uWSGI

This directory contains files for building a container image for serving LLM applications on OCI Data Science model deployment service.
* `Dockerfile`: The Dockerfile for building the image based on Oracle Linux 9.
* `requirements.txt`: The Python dependencies for serving the application.
* `app.py`: The Flask application for serving the LLM applications.

This container image will basic LangChain/LangGraph applications, including the LLM/Chat Models for OCI Model Deployment and OCI Generative AI. You may add additional dependencies into `requirements.txt` as needed.
67 changes: 67 additions & 0 deletions LLM/deployment/container/app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
"""Flask app with /health and /predict endpoint."""

import importlib.util
import logging
import os
import sys
import traceback

from flask import Flask, request


# Get logging and debugging settings from environment variables
LOG_LEVEL = os.environ.get("LOG_LEVEL", logging.INFO)
MODEL_DIR = os.environ.get("MODEL_DIR", "/opt/ds/model/deployed_model")
FLASK_DEBUG = os.environ.get("FLASK_DEBUG", False)


def set_log_level(the_logger: logging.Logger, log_level=None):
"""Sets the log level of a logger based on the environment variable.
This will also set the log level of logging.lastResort.
"""
if not log_level:
return the_logger
try:
the_logger.setLevel(log_level)
logging.lastResort.setLevel(log_level)
the_logger.info(f"Log level set to {log_level}")
except Exception:
# Catching all exceptions here
# Setting log level should not interrupt the job run even if there is an exception.
the_logger.warning("Failed to set log level.")
the_logger.debug(traceback.format_exc())
return the_logger


def import_from_path(file_path, module_name="score"):
"""Imports a module from file path."""
spec = importlib.util.spec_from_file_location(module_name, file_path)
module = importlib.util.module_from_spec(spec)
sys.modules[module_name] = module
spec.loader.exec_module(module)
return module


logger = logging.getLogger(__name__)
set_log_level(logger, LOG_LEVEL)

score = import_from_path(os.path.join(MODEL_DIR, "score.py"))
app = Flask(__name__)


@app.route("/health")
def health():
"""Health check."""
return {"status": "success"}


@app.route("/predict", methods=["POST"])
def predict():
"""Make prediction."""
payload = request.get_data()
results = score.predict(payload)
return results


if __name__ == "__main__":
app.run(debug=FLASK_DEBUG, host="0.0.0.0", port=8080)
8 changes: 8 additions & 0 deletions LLM/deployment/container/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
flask
langchain>=0.3
langchain-community>=0.3
langchain-openai>=0.2
langchain-experimental>=0.3
langgraph>=0.2
oracle-ads>=2.12
pyuwsgi
Binary file added LLM/deployment/images/workflow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
13 changes: 13 additions & 0 deletions LLM/deployment/model_artifacts/Readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Artifacts for Deploying LLM Apps and Agents

This directory contains model artifacts and examples for deploying LLM apps and agents with OCI Data Science Model Deployment Service.
* `score.py`. This is the module handling the requests and invoking the user modules.
* `runtime.yaml`. This file is currently not used but reserved for the future developments.

To add your application, simply add a Python module containing your application with an `invoke()` function.

The following files are example applications:
* `app.py`, a bare minimum app.
* `translate.py`, a LangChain app translating English to French.
* `exchange_rate.py`, a LangChain agent which can answer question with real time exchange rate.
* `graph.py`, a LangGraph multi-agent example.
6 changes: 6 additions & 0 deletions LLM/deployment/model_artifacts/app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
"""This module is a bare minimum example of an application.
"""


def invoke(inputs):
return {"message": f"This is an example app. You inputs are: {str(inputs)}"}
Loading

0 comments on commit 6e93b97

Please sign in to comment.