Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Authenticate kubernetes provider to AKS using Terraform Cloud Dynamic Credentials for Azure #2603

Open
jeffhuenemann opened this issue Oct 17, 2024 · 5 comments
Assignees

Comments

@jeffhuenemann
Copy link

Description

One-liner:

  • I want the kubernetes Terraform provider to work with Entra-enabled AKS without managing any secrets (just OIDC federations).

Scenario:

  • Have AKS cluster, pre-created in separate Terraform codebase/run, with managed Entra ID integration.
  • Creating a Terraform module that utilizes both azurerm and kubernetes providers, for onboarding new apps/apis into AKS cluster. (azurerm_user_assigned_identity, kubernetes_namespace_v1, kubernetes_service_account_v1, etc.)
  • Using Terraform Cloud with a workspace that is configured with Dynamic Credentials for Azure, and it authenticates the azurerm provider perfectly
  • The Azure identity being targeted for dynamic credentials holds:
    • Owner role of the resource group where the azurerm resources go
    • the Azure Kubernetes Service RBAC Cluster Admin role, sufficient to make any changes through the Kubernetes API of the AKS cluster

Manual version illustrating a similar idea:

# Login to Azure
az login # use whatever details/parameters for your environment

# Convert kubeconfig to inherit the Azure CLI credential you've already established
# This switches kubeconfig to use an `exec` to `kubelogin`
kubelogin convert-kubeconfig -l azurecli

# Now, do stuff with kubectl
kubectl get nodes -o wide

# Each call of kubectl runs `kubelogin get-token` to get a short-lived credential, inheriting the identity already captured for Azure

Goal:

  • The kubernetes Terraform provider is able to take on the same identity being pulled in by the azurerm provider, using that identity to call the AKS cluster's Kubernetes API when provisioning kubernetes_* resources
  • have zero secrets to store/rotate/protect (as is accomplished by the azurerm provider federating via OIDC)

Potential Terraform Configuration

I can imagine two ways to do this:

Option 1: kubernetes provider can be told to use the same Azure Dyamic Credentials as the azurerm provider

terraform {
  cloud {
    organization = "my-org"
    workspaces {
      name = "this-workspace" # this workspace is set up for dyamic azure credentials
    }
  }

  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "3.113.0"
    }
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "2.31.0"
    }
  }
}

provider "azurerm" {
  features {
    # Empty, we don't need anything special, but this block has to be here
  }

  # the main provider configuration comes from the following environment variables being set in the TFC workspace, per
  # https://developer.hashicorp.com/terraform/cloud-docs/workspaces/dynamic-provider-credentials/azure-configuration#configure-the-azurerm-or-microsoft-entra-id-provider
  # 
  # ARM_TENANT_ID = <our tenant id>
  # ARM_SUBSCRIPTION_ID = <our subscription id>
  # TFC_AZURE_PROVIDER_AUTH = true
  # TFC_AZURE_RUN_CLIENT_ID = <the client id of our pipeline credential that is configured to accept oidc>
}

data "azurerm_kubernetes_cluster" "aks" {
  resource_group_name = local.cluster_resource_group_name
  name                = local.cluster_name
}

provider "kubernetes" {
  host                              = data.azurerm_kubernetes_cluster.aks.kube_config.0.host
  cluster_ca_certificate            = base64decode(data.azurerm_kubernetes_cluster.aks.kube_config.0.cluster_ca_certificate)
  use_tfc_azure_dynamic_credentials = true # <== this is the thing that would have to be invented, maybe borrowing code from `azurerm` provider
}

# Off in a module somewhere:
# This resource is provisioned by the `kubernetes` provider, but using the Azure dynamic credential
resource "kubernetes_namespace_v1" "ns" {
  metadata {
    name = local.kubernetes_namespace_name
    labels = {
      # ...
    }
  }
}

Option 2: kubernetes provider exchanges the TFC-provided OIDC token on its own:

terraform {
  cloud {
    organization = "my-org"
    workspaces {
      name = "this-workspace" # this workspace is set up for dyamic azure credentials
    }
  }

  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "3.113.0"
    }
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "2.31.0"
    }
  }
}

provider "azurerm" {
  features {
    # Empty, we don't need anything special, but this block has to be here
  }

  # the main provider configuration comes from the following environment variables being set in the TFC workspace, per
  # https://developer.hashicorp.com/terraform/cloud-docs/workspaces/dynamic-provider-credentials/azure-configuration#configure-the-azurerm-or-microsoft-entra-id-provider
  # 
  # ARM_TENANT_ID = <our tenant id>
  # ARM_SUBSCRIPTION_ID = <our subscription id>
  # TFC_AZURE_PROVIDER_AUTH = true
  # TFC_AZURE_RUN_CLIENT_ID = <the client id of our pipeline credential that is configured to accept oidc>
}

data "azurerm_kubernetes_cluster" "aks" {
  resource_group_name = local.cluster_resource_group_name
  name                = local.cluster_name
}

# https://developer.hashicorp.com/terraform/cloud-docs/workspaces/dynamic-provider-credentials/azure-configuration#required-terraform-variable
# This "magic" variable is populated by the TFC workspace at runtime,
# And is especially required if you have multiple instances of the `azurerm` provider with aliases
variable "tfc_azure_dynamic_credentials" {
  description = "Object containing Azure dynamic credentials configuration"
  type = object({
    default = object({
      client_id_file_path  = string
      oidc_token_file_path = string
    })
    aliases = map(object({
      client_id_file_path  = string
      oidc_token_file_path = string
    }))
  })
}

provider "kubernetes" {
  host                              = data.azurerm_kubernetes_cluster.aks.kube_config.0.host
  cluster_ca_certificate            = base64decode(data.azurerm_kubernetes_cluster.aks.kube_config.0.cluster_ca_certificate)
  exec {
    api_version = "client.authentication.k8s.io/v1beta1"
    command     = "kubelogin"
    args = [
      "get-token",
      "--environment", "AzurePublicCloud",
      "--server-id", "6dae42f8-4368-4678-94ff-3960e28e3630", # Always the same, https://azure.github.io/kubelogin/concepts/aks.html
      "--client-id", "80faf920-1908-4b52-b5ef-a8e7bedfc67a", # Always the same, https://azure.github.io/kubelogin/concepts/aks.html
      "--tenant-id", data.azurerm_kubernetes_cluster.aks.azure_active_directory_role_based_access_control.0.tenant_id,
      "--authority-host", "https://login.microsoftonline.com/${data.azurerm_kubernetes_cluster.aks.azure_active_directory_role_based_access_control.0.tenant_id}", # or something similar, if it would work
      "--login", "workloadidentity",
      "--federated-token-file", var.tfc_azure_dynamic_credentials.default.oidc_token_file_path
    ]
  }
}

# Off in a module somewhere:
# This resource is provisioned by the `kubernetes` provider, but using the Azure dynamic credential
resource "kubernetes_namespace_v1" "ns" {
  metadata {
    name = local.kubernetes_namespace_name
    labels = {
      # ...
    }
  }
}

Notes:

  • 📝 This option requires kubelogin to be available within the context of the Terraform run. We need a self-hosted TFC agent anyways, due to use of a private cluster, so the TFC-provided agents wouldn't have line-of-sight to the Kubernetes API, and have installed kubelogin ourselves.
  • When the Azure Dynamic Credentials are set up, TFC places a valid JWT at the path: /home/tfc-agent/.tfc-agent/component/terraform/runs/{run-id-here}/tfc-azure-token, with issuer of https://app.terraform.io and audience of api://AzureADTokenExchange, but using that JWT with kubelogin isn't working
  • If I manually do kubelogin get-token command as specified in my kubeconfig after kubelogin convert-kubeconfig -l azurecli, I get a JWT with an issuer of https://sts.windows.net/{my-tenant-id-here}/ and audience of 6dae42f8-4368-4678-94ff-3960e28e3630, which is that static Entra ID for the AKS OIDC application that is the same for every customer. I believe this JWT is what is being submitted with calls to the Kubernetes API.

References

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@alexsomesan
Copy link
Member

Hi,

I just want to make sure that I understand what exactly is the ask here. So would using a configuration like you presented in Option 2, which would work, not exactly meet your expectations?

@jeffhuenemann
Copy link
Author

@alexsomesan Thanks for the reply - Option 2 would totally work (if that worked) to meet the need, and at that point, I would propose that this issue would be solved with just a documentation update that showed a workable solution using the exec { } method. As-is, at least when I tried it, I couldn't get the provider to authenticate using the token that TFC presents in that location.

Option 1 would also meet the need, and would rhyme more with how the azurerm and azuread providers are able to infer their authentication using just a couple of environment variables that feed the Azure SDK in the provider.

Either way it were to be accomplished, the ultimate goal is that the kubernetes provider (like azurerm and azuread) could authenticate into an AKS cluster using TFC dynamic Azure credentials.

@jtv8
Copy link

jtv8 commented Nov 26, 2024

I have a somewhat hacky workaround for this (in my case, using a GitHub Actions ID token as the credential):

locals {
  github_id_token_azure_filename = "/tmp/.github-id-token-azure"
}

data "azurerm_client_config" "current" {
}

data "azurerm_kubernetes_cluster" "aks" {
  name                = var.aks_name
  resource_group_name = var.aks_resource_group
}

data "http" "github_id_token_azure" {
  url = "${var.github_id_token_request_url}&audience=api://AzureADTokenExchange"
  request_headers = {
    Authorization = "bearer ${var.github_id_token_request_token}"
    Accept        = "application/json; api-version=2.0"
  }
}

locals {
  kubernetes_credentials = {
    host = one(data.azurerm_kubernetes_cluster.aks.kube_config).host
    cluster_ca_certificate = base64decode(
      one(data.azurerm_kubernetes_cluster.aks.kube_config)
      .cluster_ca_certificate
    )
    exec_api_version = "client.authentication.k8s.io/v1beta1"
    exec_command     = "/bin/bash"
    exec_args = [
      "-c",
      join(" ", [
        "echo \"${jsondecode(data.http.github_id_token_azure.response_body).value}\"",
        "> ${local.github_id_token_azure_filename}",
        "&&",
        "kubelogin",
        "get-token",
        "--login",
        "workloadidentity",
        "--server-id",
        "6dae42f8-4368-4678-94ff-3960e28e3630", # See https://azure.github.io/kubelogin/concepts/aks.html
      ])
    ]
    exec_env = {
      AZURE_AUTHORITY_HOST       = "https://login.microsoftonline.com/"
      AZURE_TENANT_ID            = data.azurerm_client_config.current.tenant_id
      AZURE_CLIENT_ID            = data.azurerm_client_config.current.client_id
      AZURE_FEDERATED_TOKEN_FILE = local.github_id_token_azure_filename
    }
  }
}

provider "kubernetes" {
  host                   = local.kubernetes_credentials.host
  cluster_ca_certificate = local.kubernetes_credentials.cluster_ca_certificate
  exec {
    api_version = local.kubernetes_credentials.exec_api_version
    command     = local.kubernetes_credentials.exec_command
    args        = local.kubernetes_credentials.exec_args
    env         = local.kubernetes_credentials.exec_env
  }
}

These two Terraform variables are populated by the environment variables (I do this in Terragrunt):

  github_id_token_request_url   = get_env("ACTIONS_ID_TOKEN_REQUEST_URL")
  github_id_token_request_token = get_env("ACTIONS_ID_TOKEN_REQUEST_TOKEN")

The reason it's as ugly as it is is that kubelogin get-token requires the federated ID token to be written to a file beforehand, so we need a way to force Terraform to write that file every time it starts the provider - including during the plan phase.

I can't think of a way of doing this that doesn't involve a shell command - feel free to propose better ways if you can think of them!

It would be great if there was a cleaner way to do this. I suppose the question is - do we expect the kubernetes provider to provide a wrapper for this logic for all major cloud platforms, or should this functionality be implemented upstream by the cloud platform's client-go plugin?

For instance, if kubelogin had --federated-token-request-url and --federated-token-request-token as options, that would make this a LOT cleaner - or even better, just --federated-token-provider github.

I can't find any existing issues suggesting this - want me to create one?

@jtv8
Copy link

jtv8 commented Nov 26, 2024

Rereading your issue, it's weird that there's already a valid JWT with the correct audience - which is the hard part - but it isn't working with kubelogin. That warrants some investigation.

I've had issues before if I use an ID token that was issued before the AKS cluster was created. I have a theory that there's some logic somewhere that checks that the iat claim doesn't pre-date the creation timestamp of the cluster or the managed identity. Could it be that?

@choeffer
Copy link

I don't know if this is related, but for me it was straight forward. I have seen many (for me) complex setups like https://github.com/neumanndaniel/terraform/blob/master/modules/kubelogin/main.tf or the proposed options here.

In my case, I only had to

# Login to Azure
az login
# Get AKS Creds
az aks get-credentials --resource-group "rg-XYZ" --name "aks-XYZ" --overwrite-existing
# Convert kubeconfig
kubelogin convert-kubeconfig -l azurecli

In Terraform I only had to define the following provider config

provider "kubernetes" {
  config_path    = "~/.kube/config"
  config_context = "aks-XYZ"
}

and I was able to apply this example

resource "kubernetes_namespace" "example" {
  metadata {
    name = "my-first-namespace"
  }
}

Hope this might help others, at least for local deployments. But I expect that if you convert the kubeconfig via kubelogin convert-kubeconfig -lto other methods (https://azure.github.io/kubelogin/cli/convert-kubeconfig.html), Terraform should be able to use them as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants