Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plan with k8s/helm provider doesn't wait for upstream K8s cluster. #2512

Closed
chrisbecke opened this issue Jun 4, 2024 · 2 comments
Closed

Comments

@chrisbecke
Copy link

chrisbecke commented Jun 4, 2024

Existing resources that use the kubernetes provider (and by implication, the helm provider) do not use upstream cluster details if the upstream cluster is referenced indirectly.

Which is to say:

  1. The initial deployment succeeds always
  2. Subsequent "apply"s work for as long as "endpoint", "cluster_ca_certificate" or "token" remain known.
  3. Using "data" resources to look up cluster details does NOT work even with depends_on directives.
  4. However, using the upstream resources directly does work.

Terraform Version, Provider Version and Kubernetes Version

Terraform v1.8.4
on darwin_amd64
+ provider registry.terraform.io/hashicorp/aws v5.52.0
+ provider registry.terraform.io/hashicorp/kubernetes v2.30.0

Terraform Configuration Files

provider "aws" {
}

locals {
  name = replace(basename(path.cwd), "_", "-")
}

////////////////////////////////////////////////////////////////////////////////
// A minimal EKS cluster to reproduce the issue
////////////////////////////////////////////////////////////////////////////////

data "aws_iam_policy_document" "assume_role" {
  statement {
    effect = "Allow"
    principals {
      type        = "Service"
      identifiers = ["eks.amazonaws.com"]
    }
    actions = ["sts:AssumeRole"]
  }
}

data "aws_vpc" "default" {
  default = true
}

data "aws_subnets" "default" {
  filter {
    name   = "vpc-id"
    values = [data.aws_vpc.default.id]
  }
  filter {
    name   = "default-for-az"
    values = [true]
  }
}

resource "aws_iam_role" "cluster" {
  name               = local.name
  assume_role_policy = data.aws_iam_policy_document.assume_role.json
}

resource "aws_iam_role_policy_attachment" "amazon-eks-cluster-policy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
  role       = aws_iam_role.cluster.name
}

resource "aws_iam_role_policy_attachment" "amazon-eks-vpc-resource-controller" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSVPCResourceController"
  role       = aws_iam_role.cluster.name
}

variable "authentication_mode" {
  default = null
}

resource "aws_eks_cluster" "cluster" {
  name     = local.name
  role_arn = aws_iam_role.cluster.arn
  vpc_config {
    subnet_ids = data.aws_subnets.default.ids
  }
  access_config {
    authentication_mode = var.authentication_mode
  }

  depends_on = [
    aws_iam_role_policy_attachment.amazon-eks-cluster-policy,
    aws_iam_role_policy_attachment.amazon-eks-vpc-resource-controller
  ]
}

///////////////////////////////////////////////////////////////////////////////
// A random kubernetes resource to trigger the provider
///////////////////////////////////////////////////////////////////////////////

data "aws_eks_cluster" "cluster" {
  name       = local.name
  depends_on = [aws_eks_cluster.cluster]
}

data "aws_eks_cluster_auth" "eks_auth" {
  name       = local.name
  depends_on = [aws_eks_cluster.cluster]
}

provider "kubernetes" {
  host                   = data.aws_eks_cluster.cluster.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority[0].data)
  token                  = data.aws_eks_cluster_auth.eks_auth.token
}

resource "kubernetes_namespace" "ns" {
  metadata {
    name = "test-ns"
  }
}

Steps to Reproduce

Assuming an AWS account:

  1. terraform init
  2. terraform apply
  3. Set authentication_mode = "API_AND_CONFIG_MAP" in terraform.tfvars to trigger a change to the cluster.
  4. terraform plan and observe the error
  5. remove "data" from the host= directive
  6. terraform plan and observe the error
  7. remove "data" from the cluster_ca_certificate= directive
  8. terraform plan and observe the error
  9. delete "token=" and use "exec" to generate the cluster authentication.
  10. terraform plan and observe success.

Expected Behavior

Using this provider configuration we use the upstream resource directly for the endpoint and certificate, and use exec to retrieve the token rather than relying on either of the data objects.

*provider "kubernetes" {
  host                   = aws_eks_cluster.cluster.endpoint
  cluster_ca_certificate = base64decode(aws_eks_cluster.cluster.certificate_authority[0].data)
  exec {
    api_version = "client.authentication.k8s.io/v1beta1"
    command     = "aws"
    args        = ["eks", "get-token", "--cluster-name", local.name]
  }
}

Actual Behavior

With the given code:

│ Error: Get "http://localhost/api/v1/namespaces/test-ns": dial tcp [::1]:80: connect: connection refused
│ 
│   with kubernetes_namespace.ns,
│   on main.tf line 109, in resource "kubernetes_namespace" "ns":
│  109: resource "kubernetes_namespace" "ns" {
│ 

with aws_eks_cluster.cluster.endpoint in place of data.aws_eks_cluster.cluster.endpoint, it now finds the correct endpoint:

provider "kubernetes" {
  host                   = aws_eks_cluster.cluster.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority[0].data)
  token                  = data.aws_eks_cluster_auth.eks_auth.token
}

The following results

│ Error: Get "https://5FE1EB3BB63BA7C813D2DDA68E593F88.gr7.eu-west-1.eks.amazonaws.com/api/v1/namespaces/test-ns": tls: failed to verify certificate: x509: “kube-apiserver” certificate is not trusted
│ 
│   with kubernetes_namespace.ns,
│   on main.tf line 110, in resource "kubernetes_namespace" "ns":
│  110: resource "kubernetes_namespace" "ns" {

with aws_eks_cluster.cluster.certificate_authority in place of data.aws_eks_cluster.cluster.certificate_authority it uses the correct endpoint and cert:

provider "kubernetes" {
  host                   = data.aws_eks_cluster.cluster.endpoint
  cluster_ca_certificate = base64decode(aws_eks_cluster.cluster.certificate_authority[0].data)
  token                  = data.aws_eks_cluster_auth.eks_auth.token
}

This error results

│ Error: namespaces "test-ns" is forbidden: User "system:anonymous" cannot get resource "namespaces" in API group "" in the namespace "test-ns"
│ 
│   with kubernetes_namespace.ns,
│   on main.tf line 111, in resource "kubernetes_namespace" "ns":
│  111: resource "kubernetes_namespace" "ns" {

With exec in place of data.aws_eks_cluster_auth.eks_auth.token, it can resolve the correct token.

References

The following tickets reference the "localhost" fallback but don't mention how to fix the certificate or token errors.

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@chrisbecke chrisbecke added the bug label Jun 4, 2024
@chrisbecke chrisbecke changed the title Doesn't wait for upstream K8s cluster. Plan with k8s/helm provider doesn't wait for upstream K8s cluster. Jun 4, 2024
@appilon
Copy link
Contributor

appilon commented Jun 5, 2024

Hello @chrisbecke,

This is a common problem of needing the output of one apply to configure another provider. Unfortunately at this time the prescribed advice is to break your workspace into separate steps and apply "progressively". It is something we are trying to address in the future but it's a complex problem.

@appilon appilon closed this as completed Jun 5, 2024
@chrisbecke
Copy link
Author

The weird thing is, it actually works, as long as you use the upstream objects directly to initialise the provider. If you use data objects that merely "depends_on" the upstream, then it fails.
The fact it works at all, but fails with data objects, which should honour depends_on, seemingly indicates it's a bug not a feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants