diff --git a/docs/enhancements/10411.md b/docs/enhancements/10411.md new file mode 100644 index 00000000000..c3e9bd47610 --- /dev/null +++ b/docs/enhancements/10411.md @@ -0,0 +1,71 @@ +# EP-10411: Gateway API Inference Extension Support + +* Issue: [#10411](https://github.com/k8sgateway/k8sgateway/issues/10411) + +## Background + +This EP proposes adding [Gateway API Inference Extension](https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main) (GIE) support. GIE is an open source project that originated from [wg-serving](https://github.com/kubernetes/community/tree/master/wg-serving) and is sponsored by [SIG Network](https://github.com/kubernetes/community/blob/master/sig-network/README.md#gateway-api-inference-extension). It provides APIs, a scheduling algorithm, a reference extension implementation, and controllers to support advanced routing of LLM network traffic. + +## Goals + +The following list defines goals for this EP. + +* Provide initial GIE support allowing for easy experimentation of advanced LLM traffic routing via the Endpoint Selector (ES), GIE's reference extension implementation. +* Allow users to enable/disable this feature. +* Implement GIE as a k8sgateway plugin. +* Add [InferencePool](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/api/v1alpha1/inferencepool_types.go) as a supported HTTPRoute backend reference. +* Provide the ability to manage the GIE deployment. +* Provide e2e testing of this feature. +* Provide initial user documentation, e.g. quick start guide. + +## Non-Goals + +The following list defines non-goals for this EP. + +* Run production traffic using this feature. +* Provide k8sgateway-specific GIE extensions. +* Support non-GIE traffic routing functionality that may be achieved through integration with k8sgateway-specific APIs. +* Provide stats for the gRPC connection between Gateway and GIE implementations. +* Secure the gRPC connection between Gateway and GIE implementations. +* Support k8sgateway upgrades when this feature is enabled. + +## Implementation Details + +The following sections describe implementation details for this EP. + +### Configuration + +* Update the [configuration](https://github.com/k8sgateway/k8sgateway/blob/main/install/helm/gloo/generate/values.go) API to enable/disable this feature. +* Update Helm charts to install/uninstall k8sgateway with this feature based on user-provided configuration. + + __Note:__ Existing Gateway API support, e.g. CRDs, controllers, etc. is required. + +### Plugin + +* Add GIE as a supported [plugin](https://github.com/k8sgateway/k8sgateway/tree/main/projects/gateway2/extensions2/plugins). The plugin will manage [Endpoints](https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/endpoint/endpoint.html) based on the InferencePool resource specification. The Gateway implementation, e.g. Envoy proxy, will forward matching requests using the [External Processing Filter](https://www.envoyproxy.io/docs/envoy/latest/api-v3/extensions/filters/http/ext_proc/v3/ext_proc.proto#external-processing-filter-proto) to the ES deployment. The ES is responsible for processing the request, selecting an Endpoint, and returning the selected Endpoint to Envoy for routing. + +### Controllers + +* Add a controller to reconcile InferencePool custom resources. +* Controllers should run only if the feature is enabled and GIE CRDs exist. +* Update RBAC rules to allow controllers to access GIE custom resources. + +### Deployer + +* Update the [deployer](https://github.com/k8sgateway/k8sgateway/tree/main/projects/gateway2/deployer) to manage the required ES resources, e.g. Deployment. + +### Translator and Proxy Syncer + +* Add InferencePool as a supported HTTPRoute backend reference. +* Update the [translator](https://github.com/k8sgateway/k8sgateway/tree/main/projects/gateway2/translator) package to handle InferencePool references from the HTTPRoute type. +* Enhance the [proxy_syncer](https://github.com/k8sgateway/k8sgateway/tree/main/projects/gateway2/proxy_syncer) to translate the InferencePool custom resource into a Gloo Upstream and sync with the proxy client. When an HTTPRoute references an InferencePool, ensure the Envoy ext_proc filter is attached or the cluster references the ES cluster. + +### Reporting + +* Update the [reporter](https://github.com/k8sgateway/k8sgateway/tree/main/projects/gateway2/reports) package to support status reporting, e.g. `ResolvedRefs=true` when HTTPRoute references an InferencePool. + + __Note:__ InferencePool status is currently undefined. + +## Open Questions + +1. Is a new plugin type required or can an existing type be utilized, e.g. UpstreamPlugin?