Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework Gateway API, HTTPRoute, and GRPCRoute docs #1909

Merged
merged 14 commits into from
Jan 17, 2025
100 changes: 100 additions & 0 deletions linkerd.io/content/2-edge/features/gateway-api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
---
title: Gateway API support
description: Linkerd uses the Gateway API resource types for much configuration.
---

The Gateway API is a set of CRDs in the `gateway.networking.k8s.io` API group
which describe types of traffic in a way that is independent of a specific mesh
or ingress implementation. Recent versions of Linkerd fully support the
[Kubernetes Gateway API](https://gateway-api.sigs.k8s.io/) as a core
configuration mechanism, and many Linkerd features, including [authorization
policies][auth-policy], [dynamic traffic routing][dyn-routing], and [request
timeouts][timeouts], rely on resource types from the Gateway API for
configuration.

The two primary Gateway API types used to configure Linkerd are:

- [HTTPRoute], which parameterizes HTTP requests
- [GRPCRoute], which parameterizes gRPC requests

Both of these types are used in a variety of ways when configuring Linkerd.

## Managing the Gateway API

One complication with using the Gateway API in practice is that many different
packages, not just Linkerd, may provide the Gateway API on your cluster, but
only some Gateway API *versions* are compatible with Linkerd.

In practice, there are two basic approaches to managing the Gateway API with
Linkerd. You can let Linkerd manage the Gateway API resources, or you can let a
different tool manage them.

### Option 1: Linkerd manages the Gateway API

This is the default behavior for Linkerd, which will create, update, and delete
Gateway API resources as required. In this approach, any other tools on your
system that use Gateway API resources will be need to be compatible with the
version of the Gateway API that Linkerd installs:

| Linkerd versions | Gateway API version installed | HTTPRoute version | gRPC version |
| ---------------- | ----------------------------- | ----------------- | ------------ |
| 2.15 - 2.17 | 0.7 | v1beta1 | v1alpha2 |

### Option 2: A different tool manages the Gateway API

Alternatively, you may prefer to have something other than Linkerd manage the
Gateway API resources on your cluster. To do this, you will need to instruct
Linkerd *not* to install, update, or delete the Gateway API resources, by
passing pass the `--set enableHttpRoutes=false` flag during the `linkerd install
--crds` step, or setting the `enableHttpRoutes=false` Helm value when installing
the `linkerd-crds` Helm chart.

You will also need to ensure that version of the Gateway API installed is
compatible with Linkerd:

| Linkerd versions | Compatible Gateway API versions | Recommended Gateway API version |
| ---------------- | ------------------------------- | ------------------------------- |
| 2.15 - 2.17 | 0.7, 0.7.1, 1.1.1-experimental | 1.1.1-experimental |

If possible, you should install the *recommended* Gateway API version in the
table above. (Note that the use of *experimental* Gateway API versions is
sometimes necessary to allow for full functionality; despite the name, these
versions are production capable.)

{{< warning >}}
Running Linkerd with an incompatible version of the Gateway API
on the cluster can lead to hard-to-debug issues with your Linkerd installation.
{{< /warning >}}

## Precursors to Gateway API-based configuration

Prior to the complete support of the Gateway API introduced in Linkerd 2.14,
Linkerd provided two earlier variants of configuration:

- A Linkerd-specific `HTTPRoute` CRD provided by Linkerd in the
`policy.linkerd.io` API group
- [ServiceProfiles], which allowed for configuration of per-route metrics,
retries, and timeouts prior to the introduction of the Gateway API types.

Both of these earlier configuration mechanisms continue to be supported;
however, newer feature development is focused on the standard Gateway API
types.

## Learn More

To get started with the Gateway API types, you can:

- [Configure fault injection](../../tasks/fault-injection/)
- [Configure timeouts][timeouts]
- [Configure dynamic request routing][dyn-routing]
- [Configure per-route authorization policy][auth-policy]

[HTTPRoute]: ../../reference/httproute/
[GRPCRoute]: ../../reference/grpcroute/
[Gateway API]: https://gateway-api.sigs.k8s.io/
[Service]: https://kubernetes.io/docs/concepts/services-networking/service/
[Server]: ../../reference/authorization-policy/#server
[auth-policy]: ../../tasks/configuring-per-route-policy/
[dyn-routing]:../../tasks/configuring-dynamic-request-routing/
[timeouts]: ../../features/retries-and-timeouts/
[ServiceProfiles]: ../../features/service-profiles/
81 changes: 0 additions & 81 deletions linkerd.io/content/2-edge/features/httproute.md

This file was deleted.

103 changes: 88 additions & 15 deletions linkerd.io/content/2-edge/features/retries-and-timeouts.md
Original file line number Diff line number Diff line change
@@ -1,27 +1,100 @@
---
title: Retries and Timeouts
description: Linkerd can perform service-specific retries and timeouts.
description: Linkerd can retry and timeout HTTP and gRPC requests.
weight: 3
---

Timeouts and automatic retries are two of the most powerful and useful
mechanisms a service mesh has for gracefully handling partial or transient
application failures.
Timeouts and automatic retries are two of the most powerful mechanisms a service
mesh has for gracefully handling partial or transient application failures.

Timeouts and retries can be configured using [HTTPRoute], GRPCRoute, or Service
resources. Retries and timeouts are always performed on the *outbound* (client)
side.
* **Timeouts** allow Linkerd to cancel a request that is exceeding a time
limit.
* **Retries** allow Linkerd to automatically retry failed requests, potentially
sending it to a different endpoint.

Timeouts and retries are configured with a set of annotations, e.g
`retry.linkerd.io/http` and `timeout.linkerd.io/request`. These annotations are
placed on [HTTPRoute] or [GRPCRoute] resources to configure behavior on HTTP or
gRPC requests that match those resources. Alternatively, they can be placed on
`Service` resources configure retries and timeouts for all traffic to that
service.

As of Linkerd 2.16, timeouts and retries *compose*: requests that timeout are
eligible for being retried.

{{< note >}}
Note that retries and timeouts are performed on the *outbound* (client) side.
This means that regardless of where the annotations are placed, the source of
the traffic must be meshed.
{{< /note >}}

{{< note >}}
If working with headless services, outbound policy cannot be retrieved. Linkerd
reads service discovery information based off the target IP address, and if that
happens to be a pod IP address then it cannot tell which service the pod belongs
to.
Retries and timeouts do not work with headless services. This is because
Linkerd reads service discovery information based off the target IP address, and
if that happens to be a pod IP address then it cannot tell which service the pod
belongs to.
{{< /note >}}

These can be setup by following the guides:
{{< warning >}}
Prior to Linkerd 2.16, retries and timeouts were configured with
[ServiceProfile](../../reference/service-profiles/)s. While service profiles are
still supported, retries configured with HTTPRoute or GPRCRoute are
**incompatible with ServiceProfiles**. If a ServiceProfile is defined for a
Service, proxies will use the ServiceProfile retry configuration and ignore any
retry annotations.
{{< /warning >}}

## Using retries safely

Retries are an opt-in behavior that require some thought and planning. Misuse
can be dangerous. First, automatically retrying a request that changes system
state each time it is called can be disastrous. Thus, retries should only be
used on _idempotent_ methods, i.e. methods that have the same effect even if
called multiple times.

Second, retries by definition will increase the load on your system. A set of
services that have requests being constantly retried could potentially get taken
down by the retries instead of being allowed time to recover.

The exact configuration of retry behavior to improve overall reliability
without significantly increasing risk will require some care on the part of the
user.

## Per-request policies

In addition to the annotation approach outlined above, retries and timeouts
can be set on a per-request basis by setting specific HTTP headers.

In order to enable this per-request policy, Linkerd must be installed with the
`--set policyController.additionalArgs="--allow-l5d-request-headers"` flag or
the corresponding Helm value.

{{< warning >}}
Per-request policies should **not** be enabled if your application accepts
unfiltered requests from untrusted sources. For example, if you mesh an ingress
controller which takes unfiltered Internet traffic (and you do not use
`skip-inbound-ports` to instruct Linkerd to skip handling inbound traffic to the
pod), untrusted clients will be able to specify Linkerd retry and timeout policy
on their requests.
{{< /warning >}}

Once per-request policy is enabled, you can set timeout and retry policy on
individual requests by setting these headers:

* `l5d-retry-http`: Overrides the `retry.linkerd.io/http` annotation
* `l5d-retry-grpc`: Overrides the `retry.linkerd.io/grpc` annotation
* `l5d-retry-limit`: Overrides the `retry.linkerd.io/limit` annotation
* `l5d-retry-timeout`: Overrides the `retry.linkerd.io/timeout` annotation
* `l5d-timeout`: Overrides the `timeout.linkerd.io/request` annotation
* `l5d-response-timeout`: Overrides the `timeout.linkerd.io/response` annotation

## Further reading

- [Configuring Retries](../../tasks/configuring-retries/)
- [Configuring Timeouts](../../tasks/configuring-timeouts/)
* [Retries reference](../../reference/retries/)
* [Timeout reference](../../reference/timeouts/)
* The [Debugging HTTP applications with per-route
metrics](../../tasks/books/) contains examples of retries and timeout
annotations.

[HTTPRoute]: ../httproute/
[HTTPRoute]: ../../reference/httproute/
[GRPCRoute]: ../../reference/grpcroute/
26 changes: 14 additions & 12 deletions linkerd.io/content/2-edge/features/server-policy.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,11 +70,12 @@ Linkerd uses a set of CRDs. In contrast to default policy annotations, these
policy CRDs can be changed dynamically and policy behavior will be updated on
the fly.

Two policy CRDs represent "targets" for policy: subsets of traffic over which
Three policy CRDs represent "targets" for policy: subsets of traffic over which
policy can be applied.

- [`Server`]: all traffic to a port, for a set of pods in a namespace
- [`HTTPRoute`]: a subset of HTTP requests for a [`Server`]
- [`HTTPRoute`]: a subset of HTTP requests for a `Server`
- [`GRPCRoute`]: a subset of gRPC requests for a `Server`

Two policy CRDs represent authentication rules that must be satisfied as part of
a policy rule:
Expand All @@ -90,11 +91,11 @@ authentication rules to targets.
unless an authentication rule is met

- `ServerAuthorization`: an earlier form of policy that restricts access to
[`Server`]s only (i.e. not [`HTTPRoute`]s)
`Server`s only (i.e. not `HTTPRoute`s or `GRPCRoute`s)

The general pattern for Linkerd's dynamic, fine-grained policy is to define the
traffic target that must be protected (via a combination of `Server` and
[`HTTPRoute`] CRs); define the types of authentication that are required before
`HTTPRoute` CRs); define the types of authentication that are required before
access to that traffic is permitted (via `MeshTLSAuthentication` and
`NetworkAuthentication`); and then define the policy that maps authentication to
target (via an `AuthorizationPolicy`).
Expand All @@ -105,7 +106,7 @@ details on how these resources work.
## ServerAuthorization vs AuthorizationPolicy

Linkerd 2.12 introduced `AuthorizationPolicy` as a more flexible alternative to
`ServerAuthorization` that can target [`HTTPRoute`]s as well as `Server`s. Use of
`ServerAuthorization` that can target `HTTPRoute`s as well as `Server`s. Use of
`AuthorizationPolicy` is preferred, and `ServerAuthorization` will be deprecated
in future releases.

Expand All @@ -116,11 +117,11 @@ from Kubernetes, meaning that the pod would not be able to start. Thus, any
default-deny setup must, in practice, still authorize these probes.

In order to simplify default-deny setups, Linkerd automatically authorizes
probes to pods. These default authorizations apply only when no [`Server`] is
configured for a port, or when a [`Server`] is configured but no [`HTTPRoute`]s are
configured for that [`Server`]. If any [`HTTPRoute`] matches the `Server`, these
automatic authorizations are not created and you must explicitly create them for
health and readiness probes.
probes to pods. These default authorizations apply only when no `Server` is
configured for a port, or when a `Server` is configured but no `HTTPRoute`s or
`GRPCRoute` are configured for that `Server`. If any `HTTPRoute` or `GRPCRoute`
matches the `Server`, these automatic authorizations are not created and you
must explicitly create them for health and readiness probes.

## Policy rejections

Expand All @@ -133,7 +134,7 @@ result in an abrupt termination of those connections.

## Audit mode

A [`Server`]'s default policy is defined in its `accessPolicy` field, which
A `Server`'s default policy is defined in its `accessPolicy` field, which
defaults to `deny`. That means that, by default, traffic that doesn't conform to
the rules associated to that Server is denied (the same applies to `Servers`
that don't have associated rules yet). This can inadvertently prevent traffic if
Expand Down Expand Up @@ -166,5 +167,6 @@ be allowed but it would be logged and surfaced in metrics as detailed above.
- [Authorization policy reference](../../reference/authorization-policy/)
- [Guide to configuring per-route policy](../../tasks/configuring-per-route-policy/)

[`HTTPRoute`]: ../httproute/
[`HTTPRoute`]: ../../reference/httproute/
[`GRPCRoute`]: ../../reference/grpcroute/
[`Server`]: ../../reference/authorization-policy/#server
11 changes: 6 additions & 5 deletions linkerd.io/content/2-edge/features/service-profiles.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,12 @@ description: Linkerd's service profiles enable per-route metrics as well as retr
and timeouts.
---

{{< note >}}
[HTTPRoutes](../httproute/) are the recommended method for getting per-route
metrics, specifying timeouts, and specifying retries. Service profiles continue
to be supported for backwards compatibility.
{{< /note >}}
{{< warning >}}
As of Linkerd 2.16, ServiceProfiles have been fully supplanted by [Gateway API
types](../gateway-api/), including for getting per-route metrics, specifying
timeouts, and specifying retries. Service profiles continue to be supported for
backwards compatibility, but will not receive further feature development.
{{< /warning >}}

A service profile is a custom Kubernetes resource ([CRD][crd]) that can provide
Linkerd additional information about a service. In particular, it allows you to
Expand Down
Loading
Loading