Name		Name	Last commit message	Last commit date
parent directory ..
cdk_stacks		cdk_stacks
src/notebook		src/notebook
.gitignore		.gitignore
README.md		README.md
app.py		app.py
cdk.context.json		cdk.context.json
cdk.json		cdk.json
requirements.txt		requirements.txt
sagemaker-inference-component.gif		sagemaker-inference-component.gif
source.bat		source.bat

README.md

Amazon SageMaker Inference Component CDK Python Project!

This is a CDK Python project to deploy multiple FMs to the same instance.

In this demo, we will now create inference component-based endpoints and deploy a copy of the Dolly v2 7B model and a copy of the FLAN-T5 XXL model from the Hugging Face model hub on a SageMaker real-time endpoint.

An inference component (IC) abstracts your ML model and enables you to assign CPUs, GPU, or AWS Neuron accelerators, and scaling policies per model. Inference components offer the following benefits:

SageMaker will optimally place and pack models onto ML instances to maximize utilization, leading to cost savings.
SageMaker will scale each model up and down based on your configuration to meet your ML application requirements.
SageMaker will scale to add and remove instances dynamically to ensure capacity is available while keeping idle compute to a minimum.
You can scale down to zero copies of a model to free up resources for other models. You can also specify to keep important models always loaded and ready to serve traffic.

(Image Source: AWS Blog)

The cdk.json file tells the CDK Toolkit how to execute your app.

This project is set up like a standard Python project. The initialization process also creates a virtualenv within this project, stored under the .venv directory. To create the virtualenv it assumes that there is a python3 (or python for Windows) executable in your path with access to the venv package. If for any reason the automatic creation of the virtualenv fails, you can create the virtualenv manually.

To manually create a virtualenv on MacOS and Linux:

$ python3 -m venv .venv

After the init process completes and the virtualenv is created, you can use the following step to activate your virtualenv.

$ source .venv/bin/activate

If you are a Windows platform, you would activate the virtualenv like this:

% .venv\Scripts\activate.bat

Once the virtualenv is activated, you can install the required dependencies.

(.venv) $ pip install -r requirements.txt

To add additional dependencies, for example other CDK libraries, just add them to your setup.py file and rerun the pip install -r requirements.txt command.

Set up `cdk.context.json`

Then, you should set approperly the cdk context configuration file, cdk.context.json.

For example,

{
  "sagemaker_endpoint_name": "ic-endpoint",
  "sagemaker_endpoint_config": {
    "instance_type": "ml.g5.12xlarge",
    "managed_instance_scaling": {
      "min_instance_count": 1,
      "max_instance_count": 2,
      "status": "ENABLED"
    },
    "routing_config": {
      "routing_strategy": "LEAST_OUTSTANDING_REQUESTS"
    }
  },
  "deep_learning_container_image_uri": {
    "repository_name": "huggingface-pytorch-tgi-inference",
    "tag": "2.0.1-tgi0.9.3-gpu-py39-cu118-ubuntu20.04"
  },
  "models": {
    "dolly-v2-7b": {
      "HF_MODEL_ID": "databricks/dolly-v2-7b",
      "HF_TASK": "text-generation"
    },
    "flan-t5-xxl": {
      "HF_MODEL_ID": "google/flan-t5-xxl",
      "HF_TASK": "text-generation"
    }
  },
  "inference_components": {
    "ic-dolly-v2-7b": {
      "model_name": "dolly-v2-7b",
      "compute_resource_requirements": {
        "number_of_accelerator_devices_required": 2,
        "number_of_cpu_cores_required": 2,
        "min_memory_required_in_mb": 1024
      },
      "runtime_config": {
        "copy_count": 1
      }
    },
    "ic-flan-t5-xxl": {
      "model_name": "flan-t5-xxl",
      "compute_resource_requirements": {
        "number_of_accelerator_devices_required": 2,
        "number_of_cpu_cores_required": 2,
        "min_memory_required_in_mb": 1024
      },
      "runtime_config": {
        "copy_count": 1
      }
    }
  }
}

ℹ️ The avialable Deep Learning Container (DLC) images (deep_learning_container_image_uri) can be found in here.

Deploy

At this point you can now synthesize the CloudFormation template for this code.

(.venv) $ export CDK_DEFAULT_ACCOUNT=$(aws sts get-caller-identity --query Account --output text)
(.venv) $ export CDK_DEFAULT_REGION=$(aws configure get region)
(.venv) $ cdk synth --all

Use cdk deploy command to create the stack shown above.

(.venv) $ cdk deploy --require-approval never --all

Test Run Inference

If you want to run inference, checkout this example notebook.

Clean Up

Delete the CloudFormation stack by running the below command.

(.venv) $ cdk destroy --force --all

Useful commands

cdk ls list all stacks in the app
cdk synth emits the synthesized CloudFormation template
cdk deploy deploy this stack to your default AWS account/region
cdk diff compare deployed stack with current state
cdk docs open CDK documentation

Enjoy!

References

(AWS Blog) Amazon SageMaker adds new inference capabilities to help reduce foundation model deployment costs and latency (2023-11-29)
(AWS Blog) Reduce model deployment costs by 50% on average using the latest features of Amazon SageMaker (2023-11-30)
Amazon Sagemaker API Reference - CreateInferenceComponent
Amazon SageMaker Deploy models for real-time inference
Docker Registry Paths and Example Code for Pre-built SageMaker Docker images
Available Amazon Deep Learning Containers Images page
🛠️ sagemaker-huggingface-inference-toolkit - SageMaker Hugging Face Inference Toolkit is an open-source library for serving 🤗 Transformers and Diffusers models on Amazon SageMaker.
🛠️ sagemaker-inference-toolkit - The SageMaker Inference Toolkit implements a model serving stack and can be easily added to any Docker container, making it deployable to SageMaker.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inference-component

inference-component

README.md

Amazon SageMaker Inference Component CDK Python Project!

Set up `cdk.context.json`

Deploy

Test Run Inference

Clean Up

Useful commands

References

Files

inference-component

Directory actions

More options

Directory actions

More options

Latest commit

History

inference-component

Folders and files

parent directory

README.md

Amazon SageMaker Inference Component CDK Python Project!

Set up cdk.context.json

Deploy

Test Run Inference

Clean Up

Useful commands

References

Set up `cdk.context.json`