Plan with k8s/helm provider doesn't wait for upstream K8s cluster. #2512

chrisbecke · 2024-06-04T13:46:10Z

Existing resources that use the kubernetes provider (and by implication, the helm provider) do not use upstream cluster details if the upstream cluster is referenced indirectly.

Which is to say:

The initial deployment succeeds always
Subsequent "apply"s work for as long as "endpoint", "cluster_ca_certificate" or "token" remain known.
Using "data" resources to look up cluster details does NOT work even with depends_on directives.
However, using the upstream resources directly does work.

Terraform Version, Provider Version and Kubernetes Version

Terraform v1.8.4
on darwin_amd64
+ provider registry.terraform.io/hashicorp/aws v5.52.0
+ provider registry.terraform.io/hashicorp/kubernetes v2.30.0

Terraform Configuration Files

provider "aws" {
}

locals {
  name = replace(basename(path.cwd), "_", "-")
}

////////////////////////////////////////////////////////////////////////////////
// A minimal EKS cluster to reproduce the issue
////////////////////////////////////////////////////////////////////////////////

data "aws_iam_policy_document" "assume_role" {
  statement {
    effect = "Allow"
    principals {
      type        = "Service"
      identifiers = ["eks.amazonaws.com"]
    }
    actions = ["sts:AssumeRole"]
  }
}

data "aws_vpc" "default" {
  default = true
}

data "aws_subnets" "default" {
  filter {
    name   = "vpc-id"
    values = [data.aws_vpc.default.id]
  }
  filter {
    name   = "default-for-az"
    values = [true]
  }
}

resource "aws_iam_role" "cluster" {
  name               = local.name
  assume_role_policy = data.aws_iam_policy_document.assume_role.json
}

resource "aws_iam_role_policy_attachment" "amazon-eks-cluster-policy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
  role       = aws_iam_role.cluster.name
}

resource "aws_iam_role_policy_attachment" "amazon-eks-vpc-resource-controller" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSVPCResourceController"
  role       = aws_iam_role.cluster.name
}

variable "authentication_mode" {
  default = null
}

resource "aws_eks_cluster" "cluster" {
  name     = local.name
  role_arn = aws_iam_role.cluster.arn
  vpc_config {
    subnet_ids = data.aws_subnets.default.ids
  }
  access_config {
    authentication_mode = var.authentication_mode
  }

  depends_on = [
    aws_iam_role_policy_attachment.amazon-eks-cluster-policy,
    aws_iam_role_policy_attachment.amazon-eks-vpc-resource-controller
  ]
}

///////////////////////////////////////////////////////////////////////////////
// A random kubernetes resource to trigger the provider
///////////////////////////////////////////////////////////////////////////////

data "aws_eks_cluster" "cluster" {
  name       = local.name
  depends_on = [aws_eks_cluster.cluster]
}

data "aws_eks_cluster_auth" "eks_auth" {
  name       = local.name
  depends_on = [aws_eks_cluster.cluster]
}

provider "kubernetes" {
  host                   = data.aws_eks_cluster.cluster.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority[0].data)
  token                  = data.aws_eks_cluster_auth.eks_auth.token
}

resource "kubernetes_namespace" "ns" {
  metadata {
    name = "test-ns"
  }
}

Steps to Reproduce

Assuming an AWS account:

terraform init
terraform apply
Set authentication_mode = "API_AND_CONFIG_MAP" in terraform.tfvars to trigger a change to the cluster.
terraform plan and observe the error
remove "data" from the host= directive
terraform plan and observe the error
remove "data" from the cluster_ca_certificate= directive
terraform plan and observe the error
delete "token=" and use "exec" to generate the cluster authentication.
terraform plan and observe success.

Expected Behavior

Using this provider configuration we use the upstream resource directly for the endpoint and certificate, and use exec to retrieve the token rather than relying on either of the data objects.

*provider "kubernetes" {
  host                   = aws_eks_cluster.cluster.endpoint
  cluster_ca_certificate = base64decode(aws_eks_cluster.cluster.certificate_authority[0].data)
  exec {
    api_version = "client.authentication.k8s.io/v1beta1"
    command     = "aws"
    args        = ["eks", "get-token", "--cluster-name", local.name]
  }
}

Actual Behavior

With the given code:

│ Error: Get "http://localhost/api/v1/namespaces/test-ns": dial tcp [::1]:80: connect: connection refused
│ 
│   with kubernetes_namespace.ns,
│   on main.tf line 109, in resource "kubernetes_namespace" "ns":
│  109: resource "kubernetes_namespace" "ns" {
│

with aws_eks_cluster.cluster.endpoint in place of data.aws_eks_cluster.cluster.endpoint, it now finds the correct endpoint:

provider "kubernetes" {
  host                   = aws_eks_cluster.cluster.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority[0].data)
  token                  = data.aws_eks_cluster_auth.eks_auth.token
}

The following results

│ Error: Get "https://5FE1EB3BB63BA7C813D2DDA68E593F88.gr7.eu-west-1.eks.amazonaws.com/api/v1/namespaces/test-ns": tls: failed to verify certificate: x509: “kube-apiserver” certificate is not trusted
│ 
│   with kubernetes_namespace.ns,
│   on main.tf line 110, in resource "kubernetes_namespace" "ns":
│  110: resource "kubernetes_namespace" "ns" {

with aws_eks_cluster.cluster.certificate_authority in place of data.aws_eks_cluster.cluster.certificate_authority it uses the correct endpoint and cert:

provider "kubernetes" {
  host                   = data.aws_eks_cluster.cluster.endpoint
  cluster_ca_certificate = base64decode(aws_eks_cluster.cluster.certificate_authority[0].data)
  token                  = data.aws_eks_cluster_auth.eks_auth.token
}

This error results

│ Error: namespaces "test-ns" is forbidden: User "system:anonymous" cannot get resource "namespaces" in API group "" in the namespace "test-ns"
│ 
│   with kubernetes_namespace.ns,
│   on main.tf line 111, in resource "kubernetes_namespace" "ns":
│  111: resource "kubernetes_namespace" "ns" {

With exec in place of data.aws_eks_cluster_auth.eks_auth.token, it can resolve the correct token.

References

The following tickets reference the "localhost" fallback but don't mention how to fix the certificate or token errors.

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

The text was updated successfully, but these errors were encountered:

appilon · 2024-06-05T17:51:05Z

Hello @chrisbecke,

This is a common problem of needing the output of one apply to configure another provider. Unfortunately at this time the prescribed advice is to break your workspace into separate steps and apply "progressively". It is something we are trying to address in the future but it's a complex problem.

chrisbecke · 2024-06-05T19:00:06Z

The weird thing is, it actually works, as long as you use the upstream objects directly to initialise the provider. If you use data objects that merely "depends_on" the upstream, then it fails.
The fact it works at all, but fails with data objects, which should honour depends_on, seemingly indicates it's a bug not a feature.

chrisbecke added the bug label Jun 4, 2024

chrisbecke changed the title ~~Doesn't wait for upstream K8s cluster.~~ Plan with k8s/helm provider doesn't wait for upstream K8s cluster. Jun 4, 2024

appilon added the progressive apply label Jun 5, 2024

appilon closed this as completed Jun 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plan with k8s/helm provider doesn't wait for upstream K8s cluster. #2512

Plan with k8s/helm provider doesn't wait for upstream K8s cluster. #2512

chrisbecke commented Jun 4, 2024 •

edited

Loading

appilon commented Jun 5, 2024

chrisbecke commented Jun 5, 2024

Plan with k8s/helm provider doesn't wait for upstream K8s cluster. #2512

Plan with k8s/helm provider doesn't wait for upstream K8s cluster. #2512

Comments

chrisbecke commented Jun 4, 2024 • edited Loading

Terraform Version, Provider Version and Kubernetes Version

Terraform Configuration Files

Steps to Reproduce

Expected Behavior

Actual Behavior

References

Community Note

appilon commented Jun 5, 2024

chrisbecke commented Jun 5, 2024

chrisbecke commented Jun 4, 2024 •

edited

Loading