The Gardener Controller Manager (often refered to as "GCM") is a component that runs next to the Gardener API server, similar to the Kubernetes Controller Manager. It runs several control loops that do not require talking to any seed or shoot cluster. Also, as of today it exposes a HTTPS server that is serving several endpoints for webhooks for certain resources.
This document explains the various functionalities of the Gardener Controller Manager and their purpose.
This controller consists out of three reconciliation loops:
The main loop is reconciling Project
resources while the second loop is controlling the necessary actions for stale projects.
This reconciler will create a dedicated Namespace
prefixed with garden-
for each Project
resource.
The name of the namespace can either be stated in the .spec.namespace
, or it will be auto-generated by the reconciler.
If .spec.namespace
is set then it creates it if it does not exist yet.
Otherwise, it tries to adopt it.
This will only succeed if the Namespace
was previously labeled with gardener.cloud/role=project
and project.gardener.cloud/name=<project-name>
.
This is to prevent that end-users can adopt arbitrary namespaces and escalate their privileges, e.g. the kube-system
namespace.
After the namespace was created/adopted the reconciler creates several ClusterRole
s and ClusterRoleBinding
s that allow the project members to access related resources based on their roles.
These RBAC resources are prefixed with gardener.cloud:system:project{-member,-viewer}:<project-name>
.
Gardener administrators and extension developers can define their own roles, see this document for more information.
In addition, operators can configure the Project controller to maintain a default ResourceQuota for project namespaces.
Quotas can especially limit the creation of user facing resources, e.g. Shoots
, SecretBindings
, Secrets
and thus protect the Garden cluster from massive resource exhaustion but also enable operators to align quotas with respective enterprise policies.
⚠️ Gardener itself is not exempted from configured quotas. For example, Gardener createsSecrets
for every shoot cluster in the project namespace and at the same time increases the available quota count. Please mind this additional resource consumption.
The GCM configuration provides a template section controllers.project.quotas
where such a ResourceQuota (see example below) can be deposited.
controllers:
project:
quotas:
- config:
apiVersion: v1
kind: ResourceQuota
spec:
hard:
count/shoots.core.gardener.cloud: "100"
count/secretbindings.core.gardener.cloud: "10"
count/secrets: "800"
projectSelector: {}
The Project controller takes the shown config
and creates a ResourceQuota
with the name gardener
in the project namespace.
If a ResourceQuota
resource with the name gardener
already exists, the controller will only update fields in spec.hard
which are unavailable at that time.
Labels and annotations on the ResourceQuota
config
get merged with the respective fields on existing ResourceQuota
s.
An optional projectSelector
narrows down the amount of projects that are equipped with the given config
.
If multiple configs match for a project, then only the first match in the list is applied to the project namespace.
The .status.phase
of the Project
resources will be set to Ready
or Failed
by the reconciler to indicate whether the reconciliation loop was performed successfully.
Also, it will generate Event
s to provide further information about its operations.
As Gardener is a large-scale Kubernetes as a Service it is designed for being used by a large amount of end-users.
Over time, it is likely to happen that some of the hundreds or thousands of Project
resources are no longer actively used.
Gardener offers the "stale projects" reconciler which will take care of identifying such stale projects, marking them with a "warning", and eventually deleting them after a certain time period. This reconciler is enabled by default and works as following:
- Projects are considered as "stale"/not actively used when all of the following conditions apply: The namespace associated with the
Project
does not have any...Shoot
resources.Plant
resources.BackupEntry
resources.Secret
resources that are referenced by aSecretBinding
that is in use by aShoot
(not necessarily in the same namespace).Quota
resources that are referenced by aSecretBinding
that is in use by aShoot
(not necessarily in the same namespace).- The time period when the project was used for the last time (
status.lastActivityTimestamp
) is longer than the configuredminimumLifetimeDays
If a project is considered "stale" then its .status.staleSinceTimestamp
will be set to the time when it was first detected to be stale.
If it gets actively used again this timestamp will be removed.
After some time the .status.staleAutoDeleteTimestamp
will be set to a timestamp after which Gardener will auto-delete the Project
resource if it still is not actively used.
The component configuration of the Gardener Controller Manager offers to configure the following options:
minimumLifetimeDays
: Don't consider newly createdProject
s as "stale" too early to give people/end-users some time to onboard and get familiar with the system. The "stale project" reconciler won't set any timestamp forProject
s younger thanminimumLifetimeDays
. When you change this value then projects marked as "stale" may be no longer marked as "stale" in case they are young enough, or vice versa.staleGracePeriodDays
: Don't compute auto-delete timestamps for staleProject
s that are unused for only less thanstaleGracePeriodDays
. This is to not unnecessarily make people/end-users nervous "just because" they haven't actively used theirProject
for a given amount of time. When you change this value then already assigned auto-delete timestamps may be removed again if the new grace period is not yet exceeded.staleExpirationTimeDays
: Expiration time after which staleProject
s are finally auto-deleted (after.status.staleSinceTimestamp
). If this value is changed and an auto-delete timestamp got already assigned to the projects then the new value will only take effect if it's increased. Hence, decreasing thestaleExpirationTimeDays
will not decrease already assigned auto-delete timestamps.
Gardener administrators/operators can exclude specific
Project
s from the stale check by annotating the relatedNamespace
resource withproject.gardener.cloud/skip-stale-check=true
.
Since the other two reconcilers are unable to actively monitor the relevant objects that are used in a Project
(Shoot
, Plant
, etc.), there could be a situation where the user creates and deletes objects in a short period of time. In that case the Stale Project Reconciler
could not see that there was any activity on that project and it will still mark it as a Stale
, even though it is actively used.
The Project Activity Reconciler
is implemented to take care of such cases. An event handler will notify the reconciler for any acitivity and then it will update the status.lastActivityTimestamp
. This update will also trigger the Stale Project Reconciler
.
With the Gardener Event Controller you can prolong the lifespan of events related to Shoot clusters. This is an optional controller which will become active once you provide the below mentioned configuration.
All events in K8s are deleted after a configurable time-to-live (controlled via a kube-apiserver argument called --event-ttl
(defaulting to 1 hour)).
The need to prolong the time-to-live for Shoot cluster events frequently arises when debugging customer issues on live systems.
This controller leaves events involving Shoots untouched while deleting all other events after a configured time.
In order to activate it, provide the following configuration:
concurrentSyncs
: The amount of goroutines scheduled for reconciling events.ttlNonShootEvents
: When an event reaches this time-to-live it gets deleted unless it is a Shoot-related event (defaults to1h
, equivalent to theevent-ttl
default).
⚠️ In addition, you should also configure the--event-ttl
for the kube-apiserver to define an upper-limit of how long Shoot-related events should be stored. The--event-ttl
should be larger than thettlNonShootEvents
or this controller will have no effect.
Shoot objects may specify references to further objects in the Garden cluster which are required for certain features.
For example, users can configure various DNS providers via .spec.dns.providers
and usually need to refer to a corresponding secret
with valid DNS provider credentials inside.
Such objects need a special protection against deletion requests as long as they are still being referenced by one or multiple shoots.
Therefore, the Shoot Reference Controller scans shoot clusters for referenced objects and adds the finalizer gardener.cloud/reference-protection
to their .metadata.finalizers
list.
The scanned shoot also gets this finalizer to enable a proper garbage collection in case the Gardener-Controller-Manager is offline at the moment of an incoming deletion request.
When an object is not actively referenced anymore because the shoot specification has changed or all related shoots were deleted (are in deletion), the controller will remove the added finalizer again, so that the object can safely be deleted or garbage collected.
The Shoot Reference Controller inspects the following references:
- DNS provider secrets (
.spec.dns.provider
) - Audit policy configmaps (
.spec.kubernetes.kubeAPIServer.auditConfig.auditPolicy.configMapRef
)
Further checks might be added in the future.
The Shoot Retry Controller is responsible for retrying certain failed Shoots. Currently the controller retries only failed Shoots with error code ERR_INFRA_RATE_LIMITS_EXCEEDED
.
The Seed controller in the Gardener Controller Manager reconciles Seed
objects with the help of the following reconcilers.
This reconciliation loop takes care about seed related operations in the Garden cluster. When a new Seed
object is created
the reconciler creates a new Namespace
in the garden cluster seed-<seed-name>
. Namespaces
dedicated to single
seed clusters allow us to segregate access permissions i.e., a Gardenlet must not have permissions to access objects in
all Namespaces
in the Garden cluster.
There are objects in a Garden environment which are created once by the operator e.g., default domain secret,
alerting credentials, and required for operations happening in the Gardenlet. Therefore, we not only need a seed specific
Namespace
but also a copy of these "shared" objects.
The "main" reconciler takes care about this replication:
Kind | Namespace | Label Selector |
---|---|---|
Secret | garden | gardener.cloud/role |
Every time a BackupBucket
object is created or updated, the referenced Seed
object is enqueued for reconciliation.
It's the reconciler's task to check the status
subresource of all existing BackupBuckets
that belong to this seed.
If at least one BackupBucket
has .status.lastError
, the seed condition BackupBucketsReady
will turn false
and
consequently the seed is considered as NotReady
. Once the BackupBucket
is healthy again, the seed will be re-queued
and the condition will turn true
.
The "Lifecycle" reconciler processes Seed
objects which are enqueued every 10 seconds in order to check if the responsible
Gardenlet is still responding and operable. Therefore, it checks renewals via Lease
objects of the seed in the garden cluster
which are renewed regularly by the Gardenlet.
In case a Lease
is not renewed for the configured amount in config.controllers.seed.monitorPeriod.duration
:
- The reconciler assumes that the Gardenlet stopped operating and updates the
GardenletReady
condition toUnknown
. - Additionally, conditions and constraints of all
Shoot
resources scheduled on the affected seed are set toUnknown
as well because a striking Gardenlet won't be able to maintain these conditions any more. - If the gardenlet's client certificate has expired (identified based on the
.status.clientCertificateExpirationTimestamp
field in theSeed
resource) and if it is managed by aManagedSeed
then this will be triggered for a reconciliation. This will trigger the bootstrapping process again and allows gardenlets to obtain a fresh client certificate.
The ControllerRegistration
controller makes sure that the required Gardener extensions specified by the ControllerRegistration
resources are present in the seed clusters. It also takes care of the creation and deletion of ControllerInstallation
objects for a given seed cluster.
The controller has three reconciliation loops.
This reconciliation loop watches the Seed
objects and determines which ControllerRegistrations
are required for them and creates/deletes the corresponding extension controller to reach the determined state. To begin with, it computes the kind/type combinations of extensions required for the seed. For this, the controller examines a live list of ControllerRegistration
s, ControllerInstallation
s, BackupBucket
s, BackupEntry
s, Shoot
s, and Secret
s from the garden cluster. For example, it examines the shoots running on the seed and deducts kind/type like Infrastructure/gcp
. It also decides whether they should always be deployed based on the .spec.deployment.policy
.
For the configuration options, please see this section.
Based on these required combinations, each of them are mapped to ControllerRegistration
objects and then to their corresponding ControllerInstallation
objects (if existing). The controller then creates or updates the required ControllerInstallation
objects for the given seed. It also deletes every existing ControllerInstallation
whose referenced ControllerRegistration
is not part of the required list. For example, if the shoots in the seed are no longer using the DNS provider aws-route53
, then the controller proceeds to delete the respective ControllerInstallation
object.
This reconciliation loop watches the ControllerRegistration
resource and adds finalizers to it when they are created. In case a deletion request comes in for the resource, i.e., if a .metadata.deletionTimestamp
is set, it actively scans for a ControllerInstallation
resource using this ControllerRegistration
, and decides whether the deletion can be allowed. In case no related ControllerInstallation
is present, it removes the finalizer and marks it for deletion.
This loop also watches the Seed
object and adds finalizers to it at creation. If a .metadata.deletionTimestamp
is set for the seed then the controller checks for existing ControllerInstallation
objects which reference this seed. If no such objects exist then it removes the finalizer and allows the deletion.
After the gardenlet gets deployed on the Seed cluster it needs to establish itself as a trusted party to communicate with the Gardener API server. It runs through a bootstrap flow similar to the kubelet bootstrap process.
On startup the gardenlet uses a kubeconfig
with a bootstrap token which authenticates it as being part of the system:bootstrappers
group. This kubeconfig is used to create a CertificateSigningRequest
(CSR) against the Gardener API server.
The controller in gardener-controller-manager
checks whether the CertificateSigningRequest
has the expected organisation, common name and usages which the gardenlet would request.
It only auto-approves the CSR if the client making the request is allowed to "create" the
certificatesigningrequests/seedclient
subresource. Clients with the system:bootstrappers
group are bound to the gardener.cloud:system:seed-bootstrapper
ClusterRole
, hence, they have such privileges. As the bootstrap kubeconfig for the gardenlet contains a bootstrap token which is authenticated as being part of the systems:bootstrappers
group, its created CSR gets auto-approved.
Bastion
resources have a limited lifetime, which can be extended up to a certain amount by performing a heartbeat on
them. The Bastion
controller is responsible for deleting expired or rotten Bastion
s.
- "expired" means a
Bastion
has exceeded itsstatus.ExpirationTimestamp
. - "rotten" means a
Bastion
is older than the configuredmaxLifetime
.
The maxLifetime
is an option on the Bastion
controller and defaults to 24 hours.
The deletion triggers the gardenlet to perform the necessary cleanups in the Seed cluster, so some time can pass between
deletion and the Bastion
actually disappearing. Clients like gardenctl
are advised to not re-use Bastion
s whose
deletion timestamp has been set already.
Refer to GEP-15 for more information on the lifecycle of
Bastion
resources.
Using the Plant
resource, an external Kubernetes cluster (not managed by Gardener) can be registered to Gardener. Gardener Controller Manager is the component that is responsible for the Plant
resource reconciliation. As part of the reconciliation loop, the Gardener Controller Manager performs health checks on the external Kubernetes cluster and gathers more information about it - all of this information serves for monitoring purposes of the external Kubernetes cluster.
The component configuration of the Gardener Controller Manager offers to configure the following options for the plant controller:
syncPeriod
: The duration of how often the Plant resource is reconciled, i.e., how often health checks are performed. The default value is30s
.concurrentSyncs
: The number of goroutines scheduled for reconciling events, i.e., the number of possible parallel reconciliations. The default value is5
.
The Plant
resource reports the following information for the external Kubernetes cluster:
- Cluster information
- Cloud provider information - the cloud provider type and region are maintained in the
Plant
status (.status.clusterInfo.cloud
). - Kubernetes version - the Kubernetes version is maintained in the
Plant
status (.status.clusterInfo.kubernetes.version
).
- Cloud provider information - the cloud provider type and region are maintained in the
- Cluster status
- API Server availability - maintained as condition with type
APIServerAvailable
. - Cluster
Node
s healthiness - maintained as condition with typeEveryNodeReady
.
- API Server availability - maintained as condition with type