Adds EP-10411: Gateway API Inference Extension Support

Adds Enhancement Proposal 10411 that proposes adding Gateway API Inference Extension support. Signed-off-by: Daneyon Hansen <[email protected]>
kgateway-dev · Jan 8, 2025 · c848387 · c848387
1 parent c3e1651
commit c848387
Showing 1 changed file with 71 additions and 0 deletions.
diff --git a/docs/enhancements/10411.md b/docs/enhancements/10411.md
@@ -0,0 +1,71 @@
+# EP-10411: Gateway API Inference Extension Support
+
+* Issue: [#10411](https://github.com/k8sgateway/k8sgateway/issues/10411)
+
+## Background
+
+This EP proposes adding [Gateway API Inference Extension](https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main) (GIE) support. GIE is an open source project that originated from [wg-serving](https://github.com/kubernetes/community/tree/master/wg-serving) and is sponsored by [SIG Network](https://github.com/kubernetes/community/blob/master/sig-network/README.md#gateway-api-inference-extension). It provides APIs, a scheduling algorithm, a reference extension implementation, and controllers to support advanced routing of LLM network traffic.
+
+## Goals
+
+The following list defines goals for this EP.
+
+* Provide initial GIE support allowing for easy experimentation of advanced LLM traffic routing via the Endpoint Selector (ES), GIE's reference extension implementation.
+* Allow users to enable/disable this feature.
+* Implement GIE as a k8sgateway plugin.
+* Add [InferencePool](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/api/v1alpha1/inferencepool_types.go) as a supported HTTPRoute backend reference.
+* Provide the ability to manage the GIE deployment.
+* Provide e2e testing of this feature.
+* Provide initial user documentation, e.g. quick start guide.
+
+## Non-Goals
+
+The following list defines non-goals for this EP.
+
+* Run production traffic using this feature.
+* Provide k8sgateway-specific GIE extensions.
+* Support non-GIE traffic routing functionality that may be achieved through integration with k8sgateway-specific APIs.
+* Provide stats for the gRPC connection between Gateway and GIE implementations.
+* Secure the gRPC connection between Gateway and GIE implementations.
+* Support k8sgateway upgrades when this feature is enabled.
+
+## Implementation Details
+
+The following sections describe implementation details for this EP.
+
+### Configuration
+
+* Update the [configuration](https://github.com/k8sgateway/k8sgateway/blob/main/install/helm/gloo/generate/values.go) API to enable/disable this feature.
+* Update Helm charts to install/uninstall k8sgateway with this feature based on user-provided configuration.
+
+  __Note:__ Existing Gateway API support, e.g. CRDs, controllers, etc. is required.
+
+### Plugin
+
+* Add GIE as a supported [plugin](https://github.com/k8sgateway/k8sgateway/tree/main/projects/gateway2/extensions2/plugins). The plugin will manage [Endpoints](https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/endpoint/endpoint.html) based on the InferencePool resource specification. The Gateway implementation, e.g. Envoy proxy, will forward matching requests using the [External Processing Filter](https://www.envoyproxy.io/docs/envoy/latest/api-v3/extensions/filters/http/ext_proc/v3/ext_proc.proto#external-processing-filter-proto) to the ES deployment. The ES is responsible for processing the request, selecting an Endpoint, and returning the selected Endpoint to Envoy for routing.
+
+### Controllers
+
+* Add a controller to reconcile InferencePool custom resources.
+* Controllers should run only if the feature is enabled and GIE CRDs exist.
+* Update RBAC rules to allow controllers to access GIE custom resources.
+
+### Deployer
+
+* Update the [deployer](https://github.com/k8sgateway/k8sgateway/tree/main/projects/gateway2/deployer) to manage the required ES resources, e.g. Deployment.
+
+### Translator and Proxy Syncer
+
+* Add InferencePool as a supported HTTPRoute backend reference.
+* Update the [translator](https://github.com/k8sgateway/k8sgateway/tree/main/projects/gateway2/translator) package to handle InferencePool references from the HTTPRoute type.
+* Enhance the [proxy_syncer](https://github.com/k8sgateway/k8sgateway/tree/main/projects/gateway2/proxy_syncer) to translate the InferencePool custom resource into a Gloo Upstream and sync with the proxy client. When an HTTPRoute references an InferencePool, ensure the Envoy ext_proc filter is attached or the cluster references the ES cluster.
+
+### Reporting
+
+* Update the [reporter](https://github.com/k8sgateway/k8sgateway/tree/main/projects/gateway2/reports) package to support status reporting, e.g. `ResolvedRefs=true` when HTTPRoute references an InferencePool.
+
+  __Note:__ InferencePool status is currently undefined.
+
+## Open Questions
+
+1. Is a new plugin type required or can an existing type be utilized, e.g. UpstreamPlugin?