Skip to content

Commit

Permalink
Adds EP-10411: Gateway API Inference Extension Support
Browse files Browse the repository at this point in the history
Adds Enhancement Proposal 10411 that proposes adding Gateway API
Inference Extension support.

Signed-off-by: Daneyon Hansen <[email protected]>
  • Loading branch information
danehans committed Jan 8, 2025
1 parent c3e1651 commit c848387
Showing 1 changed file with 71 additions and 0 deletions.
71 changes: 71 additions & 0 deletions docs/enhancements/10411.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# EP-10411: Gateway API Inference Extension Support

* Issue: [#10411](https://github.com/k8sgateway/k8sgateway/issues/10411)

## Background

This EP proposes adding [Gateway API Inference Extension](https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main) (GIE) support. GIE is an open source project that originated from [wg-serving](https://github.com/kubernetes/community/tree/master/wg-serving) and is sponsored by [SIG Network](https://github.com/kubernetes/community/blob/master/sig-network/README.md#gateway-api-inference-extension). It provides APIs, a scheduling algorithm, a reference extension implementation, and controllers to support advanced routing of LLM network traffic.

## Goals

The following list defines goals for this EP.

* Provide initial GIE support allowing for easy experimentation of advanced LLM traffic routing via the Endpoint Selector (ES), GIE's reference extension implementation.
* Allow users to enable/disable this feature.
* Implement GIE as a k8sgateway plugin.
* Add [InferencePool](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/api/v1alpha1/inferencepool_types.go) as a supported HTTPRoute backend reference.
* Provide the ability to manage the GIE deployment.
* Provide e2e testing of this feature.
* Provide initial user documentation, e.g. quick start guide.

## Non-Goals

The following list defines non-goals for this EP.

* Run production traffic using this feature.
* Provide k8sgateway-specific GIE extensions.
* Support non-GIE traffic routing functionality that may be achieved through integration with k8sgateway-specific APIs.
* Provide stats for the gRPC connection between Gateway and GIE implementations.
* Secure the gRPC connection between Gateway and GIE implementations.
* Support k8sgateway upgrades when this feature is enabled.

## Implementation Details

The following sections describe implementation details for this EP.

### Configuration

* Update the [configuration](https://github.com/k8sgateway/k8sgateway/blob/main/install/helm/gloo/generate/values.go) API to enable/disable this feature.
* Update Helm charts to install/uninstall k8sgateway with this feature based on user-provided configuration.

__Note:__ Existing Gateway API support, e.g. CRDs, controllers, etc. is required.

### Plugin

* Add GIE as a supported [plugin](https://github.com/k8sgateway/k8sgateway/tree/main/projects/gateway2/extensions2/plugins). The plugin will manage [Endpoints](https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/endpoint/endpoint.html) based on the InferencePool resource specification. The Gateway implementation, e.g. Envoy proxy, will forward matching requests using the [External Processing Filter](https://www.envoyproxy.io/docs/envoy/latest/api-v3/extensions/filters/http/ext_proc/v3/ext_proc.proto#external-processing-filter-proto) to the ES deployment. The ES is responsible for processing the request, selecting an Endpoint, and returning the selected Endpoint to Envoy for routing.

### Controllers

* Add a controller to reconcile InferencePool custom resources.
* Controllers should run only if the feature is enabled and GIE CRDs exist.
* Update RBAC rules to allow controllers to access GIE custom resources.

### Deployer

* Update the [deployer](https://github.com/k8sgateway/k8sgateway/tree/main/projects/gateway2/deployer) to manage the required ES resources, e.g. Deployment.

### Translator and Proxy Syncer

* Add InferencePool as a supported HTTPRoute backend reference.
* Update the [translator](https://github.com/k8sgateway/k8sgateway/tree/main/projects/gateway2/translator) package to handle InferencePool references from the HTTPRoute type.
* Enhance the [proxy_syncer](https://github.com/k8sgateway/k8sgateway/tree/main/projects/gateway2/proxy_syncer) to translate the InferencePool custom resource into a Gloo Upstream and sync with the proxy client. When an HTTPRoute references an InferencePool, ensure the Envoy ext_proc filter is attached or the cluster references the ES cluster.

### Reporting

* Update the [reporter](https://github.com/k8sgateway/k8sgateway/tree/main/projects/gateway2/reports) package to support status reporting, e.g. `ResolvedRefs=true` when HTTPRoute references an InferencePool.

__Note:__ InferencePool status is currently undefined.

## Open Questions

1. Is a new plugin type required or can an existing type be utilized, e.g. UpstreamPlugin?

0 comments on commit c848387

Please sign in to comment.