-
Notifications
You must be signed in to change notification settings - Fork 465
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Adds EP-10411: Gateway API Inference Extension Support
Adds Enhancement Proposal 10411 that proposes adding Gateway API Inference Extension support. Signed-off-by: Daneyon Hansen <[email protected]>
- Loading branch information
Showing
1 changed file
with
71 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,71 @@ | ||
# EP-10411: Gateway API Inference Extension Support | ||
|
||
* Issue: [#10411](https://github.com/k8sgateway/k8sgateway/issues/10411) | ||
|
||
## Background | ||
|
||
This EP proposes adding [Gateway API Inference Extension](https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main) (GIE) support. GIE is an open source project that originated from [wg-serving](https://github.com/kubernetes/community/tree/master/wg-serving) and is sponsored by [SIG Network](https://github.com/kubernetes/community/blob/master/sig-network/README.md#gateway-api-inference-extension). It provides APIs, a scheduling algorithm, a reference extension implementation, and controllers to support advanced routing of LLM network traffic. | ||
|
||
## Goals | ||
|
||
The following list defines goals for this EP. | ||
|
||
* Provide initial GIE support allowing for easy experimentation of advanced LLM traffic routing via the Endpoint Selector (ES), GIE's reference extension implementation. | ||
* Allow users to enable/disable this feature. | ||
* Implement GIE as a k8sgateway plugin. | ||
* Add [InferencePool](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/api/v1alpha1/inferencepool_types.go) as a supported HTTPRoute backend reference. | ||
* Provide the ability to manage the GIE deployment. | ||
* Provide e2e testing of this feature. | ||
* Provide initial user documentation, e.g. quick start guide. | ||
|
||
## Non-Goals | ||
|
||
The following list defines non-goals for this EP. | ||
|
||
* Run production traffic using this feature. | ||
* Provide k8sgateway-specific GIE extensions. | ||
* Support non-GIE traffic routing functionality that may be achieved through integration with k8sgateway-specific APIs. | ||
* Provide stats for the gRPC connection between Gateway and GIE implementations. | ||
* Secure the gRPC connection between Gateway and GIE implementations. | ||
* Support k8sgateway upgrades when this feature is enabled. | ||
|
||
## Implementation Details | ||
|
||
The following sections describe implementation details for this EP. | ||
|
||
### Configuration | ||
|
||
* Update the [configuration](https://github.com/k8sgateway/k8sgateway/blob/main/install/helm/gloo/generate/values.go) API to enable/disable this feature. | ||
* Update Helm charts to install/uninstall k8sgateway with this feature based on user-provided configuration. | ||
|
||
__Note:__ Existing Gateway API support, e.g. CRDs, controllers, etc. is required. | ||
|
||
### Plugin | ||
|
||
* Add GIE as a supported [plugin](https://github.com/k8sgateway/k8sgateway/tree/main/projects/gateway2/extensions2/plugins). The plugin will manage [Endpoints](https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/endpoint/endpoint.html) based on the InferencePool resource specification. The Gateway implementation, e.g. Envoy proxy, will forward matching requests using the [External Processing Filter](https://www.envoyproxy.io/docs/envoy/latest/api-v3/extensions/filters/http/ext_proc/v3/ext_proc.proto#external-processing-filter-proto) to the ES deployment. The ES is responsible for processing the request, selecting an Endpoint, and returning the selected Endpoint to Envoy for routing. | ||
|
||
### Controllers | ||
|
||
* Add a controller to reconcile InferencePool custom resources. | ||
* Controllers should run only if the feature is enabled and GIE CRDs exist. | ||
* Update RBAC rules to allow controllers to access GIE custom resources. | ||
|
||
### Deployer | ||
|
||
* Update the [deployer](https://github.com/k8sgateway/k8sgateway/tree/main/projects/gateway2/deployer) to manage the required ES resources, e.g. Deployment. | ||
|
||
### Translator and Proxy Syncer | ||
|
||
* Add InferencePool as a supported HTTPRoute backend reference. | ||
* Update the [translator](https://github.com/k8sgateway/k8sgateway/tree/main/projects/gateway2/translator) package to handle InferencePool references from the HTTPRoute type. | ||
* Enhance the [proxy_syncer](https://github.com/k8sgateway/k8sgateway/tree/main/projects/gateway2/proxy_syncer) to translate the InferencePool custom resource into a Gloo Upstream and sync with the proxy client. When an HTTPRoute references an InferencePool, ensure the Envoy ext_proc filter is attached or the cluster references the ES cluster. | ||
|
||
### Reporting | ||
|
||
* Update the [reporter](https://github.com/k8sgateway/k8sgateway/tree/main/projects/gateway2/reports) package to support status reporting, e.g. `ResolvedRefs=true` when HTTPRoute references an InferencePool. | ||
|
||
__Note:__ InferencePool status is currently undefined. | ||
|
||
## Open Questions | ||
|
||
1. Is a new plugin type required or can an existing type be utilized, e.g. UpstreamPlugin? |