Skip to content

Commit

Permalink
docs: add the Hugging Face secret to readme (#139)
Browse files Browse the repository at this point in the history
Signed-off-by: Kay Yan <[email protected]>
  • Loading branch information
yankay authored Jan 1, 2025
1 parent eefcaf7 commit 34862ab
Showing 1 changed file with 10 additions and 5 deletions.
15 changes: 10 additions & 5 deletions pkg/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,20 +7,26 @@ The current manifests rely on Envoy Gateway [v1.2.1](https://gateway.envoyproxy.

1. **Deploy Sample vLLM Application**

A sample vLLM deployment with the proper protocol to work with LLM Instance Gateway can be found [here](https://github.com/kubernetes-sigs/llm-instance-gateway/tree/main/examples/poc/manifests/vllm/vllm-lora-deployment.yaml#L18).
Create a Hugging Face secret to download the model [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf). Ensure that the token grants access to this model.
Deploy a sample vLLM deployment with the proper protocol to work with the LLM Instance Gateway.
```bash
kubectl create secret generic hf-token --from-literal=token=$HF_TOKEN # Your Hugging Face Token with access to Llama2
kubectl apply -f ../examples/poc/manifests/vllm/vllm-lora-deployment.yaml
```

1. **Deploy InferenceModel and InferencePool**

You can find a sample InferenceModel and InferencePool configuration, based on the vLLM deployments mentioned above, [here](https://github.com/kubernetes-sigs/llm-instance-gateway/tree/main/examples/poc/manifests/inferencepool-with-model.yaml).

Deploy a sample InferenceModel and InferencePool configuration based on the vLLM deployments mentioned above.
```bash
kubectl apply -f ../examples/poc/manifests/inferencepool-with-model.yaml
```

1. **Update Envoy Gateway Config to enable Patch Policy**

Our custom LLM Gateway ext-proc is patched into the existing envoy gateway via `EnvoyPatchPolicy`. To enable this feature, we must extend the Envoy Gateway config map. To do this, simply run:
```bash
kubectl apply -f ./manifests/enable_patch_policy.yaml
kubectl rollout restart deployment envoy-gateway -n envoy-gateway-system

```
Additionally, if you would like to enable the admin interface, you can uncomment the admin lines and run this again.

Expand Down Expand Up @@ -54,7 +60,6 @@ The current manifests rely on Envoy Gateway [v1.2.1](https://gateway.envoyproxy.
}'
```


## Scheduling Package in Ext Proc
The scheduling package implements request scheduling algorithms for load balancing requests across backend pods in an inference gateway. The scheduler ensures efficient resource utilization while maintaining low latency and prioritizing critical requests. It applies a series of filters based on metrics and heuristics to select the best pod for a given request.

Expand Down

0 comments on commit 34862ab

Please sign in to comment.