From 34862abea84ea9a886c1d4427e527111ffa6b653 Mon Sep 17 00:00:00 2001 From: Kay Yan Date: Wed, 1 Jan 2025 08:04:13 +0800 Subject: [PATCH] docs: add the Hugging Face secret to readme (#139) Signed-off-by: Kay Yan --- pkg/README.md | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/pkg/README.md b/pkg/README.md index 7f255037..b114ea76 100644 --- a/pkg/README.md +++ b/pkg/README.md @@ -7,12 +7,19 @@ The current manifests rely on Envoy Gateway [v1.2.1](https://gateway.envoyproxy. 1. **Deploy Sample vLLM Application** - A sample vLLM deployment with the proper protocol to work with LLM Instance Gateway can be found [here](https://github.com/kubernetes-sigs/llm-instance-gateway/tree/main/examples/poc/manifests/vllm/vllm-lora-deployment.yaml#L18). + Create a Hugging Face secret to download the model [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf). Ensure that the token grants access to this model. + Deploy a sample vLLM deployment with the proper protocol to work with the LLM Instance Gateway. + ```bash + kubectl create secret generic hf-token --from-literal=token=$HF_TOKEN # Your Hugging Face Token with access to Llama2 + kubectl apply -f ../examples/poc/manifests/vllm/vllm-lora-deployment.yaml + ``` 1. **Deploy InferenceModel and InferencePool** - You can find a sample InferenceModel and InferencePool configuration, based on the vLLM deployments mentioned above, [here](https://github.com/kubernetes-sigs/llm-instance-gateway/tree/main/examples/poc/manifests/inferencepool-with-model.yaml). - + Deploy a sample InferenceModel and InferencePool configuration based on the vLLM deployments mentioned above. + ```bash + kubectl apply -f ../examples/poc/manifests/inferencepool-with-model.yaml + ``` 1. **Update Envoy Gateway Config to enable Patch Policy** @@ -20,7 +27,6 @@ The current manifests rely on Envoy Gateway [v1.2.1](https://gateway.envoyproxy. ```bash kubectl apply -f ./manifests/enable_patch_policy.yaml kubectl rollout restart deployment envoy-gateway -n envoy-gateway-system - ``` Additionally, if you would like to enable the admin interface, you can uncomment the admin lines and run this again. @@ -54,7 +60,6 @@ The current manifests rely on Envoy Gateway [v1.2.1](https://gateway.envoyproxy. }' ``` - ## Scheduling Package in Ext Proc The scheduling package implements request scheduling algorithms for load balancing requests across backend pods in an inference gateway. The scheduler ensures efficient resource utilization while maintaining low latency and prioritizing critical requests. It applies a series of filters based on metrics and heuristics to select the best pod for a given request.