Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dial tcp 127.0.0.1:80: connect: connection refused #2007

Closed
1 task done
ArchiFleKs opened this issue Apr 11, 2022 · 69 comments
Closed
1 task done

dial tcp 127.0.0.1:80: connect: connection refused #2007

ArchiFleKs opened this issue Apr 11, 2022 · 69 comments

Comments

@ArchiFleKs
Copy link
Contributor

Description

I know there are numerous issues (#817) related to this problem, but since v18.20.1 reintroduced the management of configmap thought we could discuss in a new one because the old ones are closed.

The behavior is till very weird. I updated my module to use the configmap management feature and the first run went fine (was using the aws_eks_cluster_auth datasource. When I run the module with no change I have no error either in plan or apply.

I then tried to update my cluster form v1.21 to v1.22 and then plan and apply began to fail with the following well know error:

null_resource.node_groups_asg_tags["m5a-xlarge-b-priv"]: Refreshing state... [id=7353592322772826167]                                                                                                                                    
╷                                                                                                                                                                                                                                        
│ Error: Get "http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth": dial tcp 127.0.0.1:80: connect: connection refused                                                                                                    
│                                                                                                                                                                                                                                        
│   with kubernetes_config_map_v1_data.aws_auth[0],                                                                                                                                                                                      
│   on main.tf line 428, in resource "kubernetes_config_map_v1_data" "aws_auth":                                                                                                                                                         
│  428: resource "kubernetes_config_map_v1_data" "aws_auth" {                                                                                                                                                                            
│                                                                                                                                                                                                                                        
╵                                                           

I then moved to the exec plugin as recommended per the documentation and removed from state the old datasource. Still go the same error.

Something I don't get is when setting the variable export KUBE_CONFIG_PATH=$PWD/kubeconfig as suggested in #817 things work as expected.

I'm sad to see things are still unusable (not related to this module but on the Kubernetes provider side), load_config_file option has been removed from Kubernetes provider for a while and I don't see why this variable needs to be set and how it could be set beforehand.

Anyway, if someone managed to use the readded feature of managing configmap I'd be glad to know how to workaround this and help debug this issue.

PS: I'm using Terragrunt, not sure if the issue could be related but it might

  • ✋ I have searched the open/closed issues and my issue is not listed.

Versions

  • Module version [Required]:
Terraform v1.1.7
on linux_amd64
+ provider registry.terraform.io/hashicorp/aws v4.9.0
+ provider registry.terraform.io/hashicorp/cloudinit v2.2.0
+ provider registry.terraform.io/hashicorp/kubernetes v2.10.0
+ provider registry.terraform.io/hashicorp/null v3.1.1
+ provider registry.terraform.io/hashicorp/tls v3.3.0

Reproduce

Here is my provider block

provider "kubernetes" {
  host                   = data.aws_eks_cluster.cluster.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    command     = "aws"
    args        = ["eks", "get-token", "--cluster-name", data.aws_eks_cluster.cluster.id]
  }
}

data "aws_eks_cluster" "cluster" {
  name = aws_eks_cluster.this[0].id
}
@PLeS207
Copy link

PLeS207 commented Apr 11, 2022

I have the same issue but when I work with state with another AWS user , I'm got error like

Error: Unauthorized  

with module.eks.module.eks.kubernetes_config_map.aws_auth[0],   
on .terraform/modules/eks.eks/main.tf line 411, in resource "kubernetes_config_map" "aws_auth":
411: resource "kubernetes_config_map" "aws_auth" {

@FeLvi-zzz
Copy link

FeLvi-zzz commented Apr 11, 2022

Would you try replacing aws_eks_cluster.this[0].id with the hard coded cluster name?

I guess aws_eks_cluster.this[0].id would be known after apply because you're going to bump up EKS cluster version. That's why the data resource is indeterminate, and kubernetes provider will fallback to default 127.0.0.1:80.

@bryantbiggs
Copy link
Member

Would you try replacing aws_eks_cluster.this[0].id with the hard coded cluster name?

I guess aws_eks_cluster.this[0].id would be known after apply because you're going to bump up EKS cluster version. That's why the data resource is indeterminate, and kubernetes provider will fallback to default 127.0.0.1:80.

not quite true - if the data source fails to find a result, its a failure not indeterminate.

@ArchiFleKs you shouldn't need the data source at all; does this still present the same issue?

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    command     = "aws"
    # This requires the awscli to be installed locally where Terraform is executed
    args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id]
  }
}

@sergiofteixeira
Copy link

sergiofteixeira commented Apr 12, 2022

Would you try replacing aws_eks_cluster.this[0].id with the hard coded cluster name?
I guess aws_eks_cluster.this[0].id would be known after apply because you're going to bump up EKS cluster version. That's why the data resource is indeterminate, and kubernetes provider will fallback to default 127.0.0.1:80.

not quite true - if the data source fails to find a result, its a failure not indeterminate.

@ArchiFleKs you shouldn't need the data source at all; does this still present the same issue?

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    command     = "aws"
    # This requires the awscli to be installed locally where Terraform is executed
    args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id]
  }
}

you cant run these in tf cloud though, cause of the local exec

@bryantbiggs
Copy link
Member

Would you try replacing aws_eks_cluster.this[0].id with the hard coded cluster name?
I guess aws_eks_cluster.this[0].id would be known after apply because you're going to bump up EKS cluster version. That's why the data resource is indeterminate, and kubernetes provider will fallback to default 127.0.0.1:80.

not quite true - if the data source fails to find a result, its a failure not indeterminate.
@ArchiFleKs you shouldn't need the data source at all; does this still present the same issue?

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    command     = "aws"
    # This requires the awscli to be installed locally where Terraform is executed
    args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id]
  }
}

you cant run these in tf cloud though, cause of the local exec

This is just merely pointing to what the Kubernetes provider documentation specifies. The module doesn't have any influence over this aspect

@ArchiFleKs
Copy link
Contributor Author

ArchiFleKs commented Apr 12, 2022

I can confirm that this snippet works as expected without the datasource:

provider "kubernetes" {
  host                   = aws_eks_cluster.this[0].endpoint
  cluster_ca_certificate = base64decode(aws_eks_cluster.this[0].certificate_authority.0.data)
  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    command     = "aws"
    args        = ["eks", "get-token", "--cluster-name", aws_eks_cluster.this[0].id]
  }
}

@bryantbiggs
Copy link
Member

I know Hashi are hiring and have made some hires to start offering more support to the Kubernetes and Helm providers recently so hopefully some of these quirks get resolved soon! for now, we can just keep sharing what others have found to have worked for their setups 🤷🏽‍♂️

@evenme
Copy link

evenme commented Apr 12, 2022

Unfortunately, it doesn't seem to work with tf-cloud (it gets the Error: failed to create kubernetes rest client for read of resource: Get "http://localhost/api?timeout=32s": dial tcp 127.0.0.1:80: connect: connection refused error), I locked the module on v18.19 so it still works.

@evenme
Copy link

evenme commented Apr 12, 2022

Apparently using kubectl provider instead of kubernetes provider (even completely removing it) made it work with terraform-cloud 🤷‍♀️ :

provider "kubectl" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
  token                  = data.aws_eks_cluster_auth.cluster.token
  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    command     = "aws"
    args        = ["eks", "get-token", "--cluster-name", module.eks.cluster_id]
  }
}

but unfortunately this got the previously working aws-auth deleted and was not able to create one Error: The configmap "aws-auth" does not exist... :|

@MadsRC
Copy link

MadsRC commented Apr 19, 2022

I just ran into this while debugging an issue during redeployment of a cluster. I'm not sure exactly how it happened, but we ended up in a state where the cluster had been destroyed, which caused terraform to not be able to connect to the cluster (duh...) using the provider and such defaulted to 120.0.0.1 when trying to touch the config map...

As mentioned, I'm not sure exactly how it ended up in that state, but it got so bad that I'd get this dial tcp 127.0.0.1:80: connect: connection refused error on terraform plan even with all references to the config map removed. Turns out there was still a reference to the config map in the state file, so removing that using terraform state rm module.eks.this.kubernetes_config_map_v1_data.aws_auth allowed me to redeploy...

Maybe not applicable to most of you, but hopefully it's useful for someone in the future :D

@bryantbiggs
Copy link
Member

hey all - let me know if its still worthwhile to leave this issue open. I don't think there is anything further we can do here in this module to help alleviate any of the issues shown - there seems to be some variability in terms of what works or does not work for folks. I might be biased, but I think the best place to look at sourcing some improvements/resolution would be upstream with the other providers (Kubernetes, Helm, Kubectl, etc.)

@kaykhancheckpoint
Copy link

kaykhancheckpoint commented Apr 25, 2022

I'm also experiencing this, in the meantime are there any work arounds?

Im experiencing the same problem with the latest version. Initial creation of cluster worked fine but trying to update any resources after creation i get the same error.

│ Error: Get "http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth": dial tcp 127.0.0.1:80: connect: connection refused
│
│   with module.eks.kubernetes_config_map_v1_data.aws_auth[0],
│   on .terraform/modules/eks/main.tf line 431, in resource "kubernetes_config_map_v1_data" "aws_auth":
│  431: resource "kubernetes_config_map_v1_data" "aws_auth" {
│

Same as the example below except i had multiple profiles on my machine and had to specify the profile.
https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/examples/eks_managed_node_group/main.tf#L5-L15

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    command     = "aws"
    # This requires the awscli to be installed locally where Terraform is executed
    args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id, "--profile", "terraformtest"]
  }
}

@DimamoN
Copy link

DimamoN commented Apr 25, 2022

Faced the same, then checked state using terraform state list and found k8s related entries there.
Then I removed then using

terraform state rm module.eks.kubernetes_config_map.aws_auth[0]

And that helped to resolve the issue.

@kaykhancheckpoint
Copy link

kaykhancheckpoint commented Apr 25, 2022

The previous suggestions didin't work for me (maybe i misunderstood something)

  1. export KUBE_CONFIG_PATH=$PWD/kubeconfig

This kubeconfig does not appear to exist in my current path...

  1. Deleting the datasource

The latest version of this example and module does not use a datasource, instead just uses module.eks.cluster_id but still get this error.


i ended up deleting the aws_auth from the state, it allowed me to continue/resolve the connection refused problem.

terraform state rm 'module.eks.kubernetes_config_map_v1_data.aws_auth[0]'

I don't know what the implications of rm'ing this state has, is it safe to keep removing this state whenever we encounter this error?.

@FernandoMiguel
Copy link
Contributor

FernandoMiguel commented Apr 26, 2022

a brand new cluster and tf state, eks 1.22

terraform {
  required_version = ">= 1.1.8"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = ">= 4.9"
    }
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = ">= 2.10"
    }
    kubectl = {
      source  = "gavinbunney/kubectl"
      version = ">= 1.13.1"
    }
  }
}

provider "aws" {
  alias  = "without_default_tags"
  region = var.aws_region
  assume_role {
    role_arn = var.assume_role_arn
  }
}

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    command     = "aws"
    # This requires the awscli to be installed locally where Terraform is executed
    args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id]
  }
}
locals {
  ## strips 'aws-reserved/sso.amazonaws.com/' from the AWSReservedSSO Role ARN
  aws_iam_roles_AWSReservedSSO_AdministratorAccess_role_arn_trim = replace(one(data.aws_iam_roles.AWSReservedSSO_AdministratorAccess_role.arns), "/[a-z]+-[a-z]+/([a-z]+(\\.[a-z]+)+)\\//", "")

  aws_auth_roles = concat([
    {
      rolearn  = data.aws_iam_role.terraform_role.arn
      username = "terraform"
      groups   = ["system:masters"]
    },
    {
      rolearn  = local.aws_iam_roles_AWSReservedSSO_AdministratorAccess_role_arn_trim
      username = "sre"
      groups   = ["system:masters"]
    }
  ],
    var.aws_auth_roles,
  )
}
  # aws-auth configmap
  create_aws_auth_configmap = var.self_managed_node_groups != [] ? true : null
  manage_aws_auth_configmap = true
  aws_auth_roles            = local.aws_auth_roles
  aws_auth_users            = var.aws_auth_users
  aws_auth_accounts         = var.aws_auth_accounts

leads to:

│ Error: Unauthorized
│
│   with module.eks.module.eks.kubernetes_config_map.aws_auth[0],
│   on .terraform/modules/eks.eks/main.tf line 414, in resource "kubernetes_config_map" "aws_auth":
│  414: resource "kubernetes_config_map" "aws_auth" {

any ideas @bryantbiggs ?
thanks in advance.

@mebays
Copy link

mebays commented Apr 29, 2022

@FernandoMiguel I'm seeing something similar in a configuration I'm working with. After some time of thought I believe you'll need to add the Assumed role to your configuration

provider "aws" {
  alias  = "without_default_tags"
  region = var.aws_region
  assume_role {
    role_arn = var.assume_role_arn
  }
}

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    command     = "aws"
    # This requires the awscli to be installed locally where Terraform is executed
    args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id,"--role", var.assume_role_arn]
  }
}

Sadly this isn't a solution for me. The configuration I'm working with uses dynamic credentials fed in.

Something along the lines...

provider "aws" {
  access_key = <access_key>
  secret_key = <secret_key>
  token = <token>
  region = <region>
}

This is useful if doing something where a temporary vm or container or tfe is running the terraform execution

Going down this route the provider is getting fed the information for connection and used entirely within the provider context (no aws config process was ever used).

The problem is none of that data is stored or carried over, so when the kubernetes provider tries to run the exec it's going to default to the methods the aws cli uses (meaning a locally store config in ~/.aws/config or ~/.aws/credentials). In my case that doesn't exist.

@FernandoMiguel it looks like your are presumably using a ~/.aws/config, so passing the assumed role and possibly the profile (if not using a default) should help move that forward. I cannot guarantee it will fix it, but that would be the theory.

@FernandoMiguel
Copy link
Contributor

No config and no aws creds hardcoded.
Everything is assume role from a global var.
This works on hundreds of our projects.

@FernandoMiguel
Copy link
Contributor

If you mean the cli exec, that's running from aws-vault exec --server

@mebays
Copy link

mebays commented Apr 29, 2022

@FernandoMiguel Hmm well that's interesting. I was able to get a solution to work for me.

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    command     = "aws-iam-authenticator"
    # This requires the awscli to be installed locally where Terraform is executed
    args = ["token", "-i, module.eks.cluster_id]
  }
}

This seemed to work for me, but I also had to expose my endpoint to be public for the first run. Our network configuration was locked down too tightly for our remote execution server to hit the endpoint. That could be something else you make sure you are hitting.

If you mean the cli exec, that's running from aws-vault exec --server

What I meant was if credentials are being passed to the aws provider than I would necessarily see them being passed to the kubernetes provider. Some trouble shooting you could try it TF_LOG=debug terraform plan ... in order to get more information if you haven't tried that. If you really wanted to test if the kubernetes exec works spin up a vm or container pass the credentials and see if that carries over.

If my guess it correct than a way around it would be creating a ~/.aws/credentials file using a null resource and template out configuration that aws eks get-token can then reference.

The thought process I am having is the data being passed into the kubernetes provider contains no information about aws configuration. So I would expect it to fail if the instance running the terraform didn't have the aws cli configured.


Further thought if the remote execution tool being used doesn't have an ~/.aws/config but running inside an instance with an IAM role attached to it. Then it would default to that IAM role, so then it could still work as long as that IAM role has the ability to assume the role.

@mebays
Copy link

mebays commented Apr 29, 2022

@bryantbiggs I think the thought process I had from above just reassures your comment. I don't think there is anything in this module that can be done to fix this. I do have a suggestion of not completely remove the aws_auth_configmap_yaml output unless you have other solutions coming up. The reasoning is I could see a use case where terraform is ran to provision private cluster which may or may not be running on an instance that can reach that endpoint. If it isn't the aws_auth_configmap_yaml can be used in a completely separate process to hit the private cluster endpoint. It all depends on how separation of duties may come into play (a person to provision, and maybe a person to configure). It's just a thought.

@FernandoMiguel
Copy link
Contributor

I would love to know what isn't working here.
I spent a large chunk of this week trying every combo I could think to get this to work, without success.
Different creds for the kube provider, different parallelism settings, recreating the code outside of the module so it would run after the eks cluster module had finished, etc..
I would always get either authentication error, that the config map didn't exist or that it couldn't create it.
Very frustrating.

If we were to keep the now deprecated output, I can at least revert my internal PR and keep using that old and terrible null exec code to patch the config map.

@tanvp112
Copy link

tanvp112 commented Apr 30, 2022

The problem might be the terraform-provider-kubernetes and not terraform-aws-eks, eg. hashicorp/terraform-provider-kubernetes#1479, ... more about localhost connection refused. This one can really be difficult to catch.

@FernandoMiguel
Copy link
Contributor

@tanvp112 you are onto something there

we have this provider
image
notice the highlight bit
that is not available until the cluster is up
so it is possible that this provider is getting initialised with the wrong endpoint
maybe even "localhost"
and ofc that explains why auth fails
explains why the 2nd apply works fine, cause now the endpoint is correct

@mebays
Copy link

mebays commented May 3, 2022

So my issue was with authentication, and I believe this example clearly states the issue.

The example state that you must set AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY. Doing a little more digging and for those having issues with authentication could try something like this.

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    # This would set up the aws cli configuration if there is no config or credential file running on the host  that would run the aws cli command
    env = {
        AWS_ACCESS_KEY_ID = var.access_key_id
        AWS_SECRET_ACCESS_KEY = var.secret_access_key
        AWS_SESSION_TOKEN = var.token
    } 
    # This requires the awscli to be installed locally where Terraform is executed\
    command     = "aws"
    args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id]
  }
}

I haven't gotten to try this myself, but it should work. The AWS_SESSION_TOKEN would only be needed for an assumed role process, but it could possibly work.

@FernandoMiguel
Copy link
Contributor

So my issue was with authentication, and I believe this example clearly states the issue.

The example state that you must set AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY. Doing a little more digging and for those having issues with authentication could try something like this.

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    # This would set up the aws cli configuration if there is no config or credential file running on the host  that would run the aws cli command
    env = {
        AWS_ACCESS_KEY_ID = var.access_key_id
        AWS_SECRET_ACCESS_KEY = var.secret_access_key
        AWS_SESSION_TOKEN = var.token
    } 
    # This requires the awscli to be installed locally where Terraform is executed\
    command     = "aws"
    args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id]
  }
}

I haven't gotten to try this myself, but it should work. The AWS_SESSION_TOKEN would only be needed for an assumed role process, but it could possibly work.

I honestly don't know what you are trying to do...
aws iam auth can be done in many ways.
not everyone has a dedicated IAM account... we use assume roles, for ex.

@mebays
Copy link

mebays commented May 4, 2022

I honestly don't know what you are trying to do...
aws iam auth can be done in many ways.
not everyone has a dedicated IAM account... we use assume roles, for ex.

When you assume a role your retrieve an temporary access key, secret key, and token. My code snippet is an example for when a user is running things in a jobbed off process inside of a container. Where the container contains no context for AWS (no config or credentials file). That is my use case where my runs are an isolated instance that does not persist (Terraform Cloud follows this same structure, but does not have aws installed by default), and run in a CICD pipeline fashion not on a local machine.

When the aws provider is used the configuration information is is passed into the provider for this example.
(I'm making it simple. My context actually uses dynamic credential by using hashicorp vault, but don't want to introduce that complexity in this explanation.)

provider "aws" {
  region = "us-east-1"
  access_key = "<access key | passed via variable or some data query>"
  secret_key = "<secret access key | passed via variable or some data query>"
  token = "<session token | passed via variable or some data query>"
}

In this instance the AWS Provider has all information passed in and using the Provider Configuration method. On this run no local aws config file or environment variables exist, so it needs this to make any aws connection.

All aws resources create successfully in this process, besides that aws-auth configmap, when using the suggested example.

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    # This requires the awscli to be installed locally where Terraform is executed\
    command     = "aws"
    args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id]
}

The reason this is failing is the Kubernetes provider has no context on what you use for the aws command because no config or environment variables are being used. Therefore this will fail

  • NOTE: This will also fail if you have a local AWS Config loaded using a config file or environment variable that does not run as the same role as the EKS cluster was created. The only auth by default is the user or role that created the cluster. So if the local user cannot assume the role used with the above aws provider. The kubernetes commands will fail as well.

That is how the suggested route came to be.

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    # This would set up the aws cli configuration if there is no config or credential file running on the host  that would run the aws cli command
    env = {
        AWS_ACCESS_KEY_ID = "<same access key passed to aws provider | passed via variable or some data query>"
        AWS_SECRET_ACCESS_KEY = "<same secret access key passed to aws provider | passed via variable or some data query>"
        AWS_SESSION_TOKEN = "<same session token passed to aws provider | passed via variable or some data query>"
}
    } 
    # This requires the awscli to be installed locally where Terraform is executed\
    command     = "aws"
    args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id]
  }
}

In this provider block it is purposely passing in the required credential/configuration needed for the aws cli to successfully call aws eks get-token --cluster-name <cluster name>. Because the kubernetes provider does not care what was passed in to the aws provider. There is no shared context because there is no local configuration file or environment variables being leveraged.

@FernandoMiguel does this make sense on what I was trying to attain now? This may not be your use case, but it is useful information for anyone trying to run this module using some external remote execution tool.

I'm going to add this module does not contain the issue, but adding the above snippet to the documentation may help out those that may be purposely providing configuration to the aws provider vs utilizing Environment variables or local config files.

@FernandoMiguel
Copy link
Contributor

In this provider block it is purposely passing in the required credential/configuration needed for the aws cli to successfully call aws eks get-token --cluster-name <cluster name>. Because the kubernetes provider does not care what was passed in to the aws provider. There is no shared context because there is no local configuration file or environment variables being leveraged.

@FernandoMiguel does this make sense on what I was trying to attain now? This may not be your use case, but it is useful information for anyone trying to run this module using some external remote execution tool.

it does. I've been fighting issued using the kube provider for weeks with what seems a race condition or failed to initialise endpoint/creds.
Sadly, in our case, your snippet does not help since creds are already available via metadata endpoint.
but it's a good idea to always double check if CLI tools are using the expected creds.

@alfredo-gil
Copy link

alfredo-gil commented May 5, 2022

I was having the same issue but the solution that worked for me is to configure the kubernetes provider to use the role, something like this:

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    command     = "aws"
    # This requires the awscli to be installed locally where Terraform is executed
    args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id, "--role", "arn:aws:iam::${AWS_ACCOUNT_ID}:role/${ROLE_NAME}" ]
  }
}

@FernandoMiguel
Copy link
Contributor

I was having the same issue but the solution that worked for me is to configure the kubernetes provider to use the role, something like this:

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    command     = "aws"
    # This requires the awscli to be installed locally where Terraform is executed
    args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id, "--role", "arn:aws:iam::${AWS_ACCOUNT_ID}:role/${ROLE_NAME}" ]
  }
}

Ohh that's an interesting option... Need to try that

@bcarranza
Copy link

to remove the block from remote state
Hi I'm in the same point that you @adiii717 !!!, Have you been able to get through this?

@adiii717
Copy link

adiii717 commented Sep 9, 2022

@bcarranza actually the error keeps the same until I have to destroy and recreate, the more strange part is that the destroy recognize the same cluster but the apply does not.
so I will say the latest module is pretty unstable which definitely create problem in the live environment, been using 17.x so far in live but did not face any issue so far

@bcarranza
Copy link

@bcarranza actually the error keeps the same until I have to destroy and recreate, the more strange part is that the destroy recognize the same cluster but the apply does not. so I will say the latest module is pretty unstable which definitely create problem in the live environment, been using 17.x so far in live but did not face any issue so far

Hi @adiii717 , In my case I can't destroy the cluster, because even though it happens to me in an early environment, I don't want to imagine this happening in production, so I have to find a solution without destroying the cluster, as a preventive measure if this happens in production.

@dcarrion87
Copy link

dcarrion87 commented Sep 18, 2022

This is such a frustrating issue having to do crazy workarounds to get the auth mechanism to work.

Only an issue when certain changes to the EKS cluster that cause data sources to be empty.

@github-actions
Copy link

This issue has been automatically marked as stale because it has been open 30 days
with no activity. Remove stale label or comment or this issue will be closed in 10 days

@github-actions github-actions bot added the stale label Oct 19, 2022
@sotiriougeorge
Copy link

This is still a major issue.

@bryantbiggs
Copy link
Member

This is still a major issue.

This isn't a module issue. This is at the provider level; there isn't anything we can do here

@github-actions github-actions bot removed the stale label Oct 20, 2022
@evercast-mahesh2021
Copy link

evercast-mahesh2021 commented Oct 21, 2022

I am getting below error while if i touch/change/comment/update anything on cluster_security_group_description and cluster_security_group_name variables. I just wanted to get a default name and description of sg that is created for EKS by default. I am using version = "~> 18.23.0".

cluster_security_group_description = "Short Description"
cluster_security_group_name = local.name_suffix

Error: Get "http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth": dial tcp [::1]:80: connect: connection refused

  with module.eks_cluster.kubernetes_config_map_v1_data.aws_auth[0],
  on .terraform/modules/eks_cluster/main.tf line 443, in resource "kubernetes_config_map_v1_data" "aws_auth":
 443: resource "kubernetes_config_map_v1_data" "aws_auth" {

Any solution for this?

Thanks!

@mesobreira
Copy link

Hello,

Regarding this problem, I also had this problem and found a workaround.
Since this issue happens when the EKS datasources are only "known after application" during the terraform plan due to ControlPlane endpoint changes, I created an external datasource that basically fetches the EKS cluster endpoint and certificates from a shell script (it uses the aws command line). The script is in attach.
I set it as my default data source. If this datasource fails (usually when I create a new cluster), it switches to the default EKS datasource.
But with this external datasource, I no longer depend on the state of terraform and then any "Known after application" has no impact.

This is the content of the .tf file used to instantiate the kubernetes providers :
data "aws_region" "current" {}

data "external" "aws_eks_cluster" {
program = ["sh","${path.module}/script/get_endpoint.sh" ]
query = {
cluster_name = "${var.kubernetes_properties.cluster_name}"
region_name = "${data.aws_region.current.name}"
}
}

provider "kubernetes" {
host = data.external.aws_eks_cluster.result.cluster_endpoint == "" ? data.aws_eks_cluster.this[0].endpoint : data.external.aws_eks_cluster.result.cluster_endpoint
cluster_ca_certificate = data.external.aws_eks_cluster.result.cluster_endpoint == "" ? base64decode(data.aws_eks_cluster.this[0].certificate_authority[0].data) : base64decode(data.external.aws_eks_cluster.result.certificate_data)
exec {
api_version = "client.authentication.k8s.io/v1beta1"
args = ["eks", "get-token", "--cluster-name", var.kubernetes_properties.cluster_name, "--role-arn", try(data.aws_iam_session_context.this[0].issuer_arn, "")]
command = "aws"
}
}

The same configuration can be applied to kubectl and helm providers.
I have created clusters and changed EKS control plane configurations using this workaround and have no issues so far.

I know that External Data Source is not recommended as it's a bypass to the terraform state, but in this case it's very useful.
get_endpoint.sh.gz

@bryantbiggs
Copy link
Member

But with this external datasource, I no longer depend on the state of terraform and then any "Known after application" has no impact.

That is entirely inaccurate. The kubernetes/helm/kubectl providers will always need a clusters certificate and endpoint, in some shape or form, which are not values that you can know before the cluster comes into existence

@mesobreira
Copy link

My bad. What I was trying to say is that after the cluster is created, I will not depend on "know after applying" in case of changes in the EKS control plane. If the cluster does not exist of course, I cannot retrieve the EKS cluster endpoint and certificate.

That's why I said, "If this datasource fails (usually when I create a new cluster), it switches to the default EKS datasource."

That's why I have this condition:
data.external.aws_eks_cluster.result.cluster_endpoint == ""? data.aws_eks_cluster.this[0].endpoint: data.external.aws_eks_cluster.result.cluster_endpoint

@evercast-mahesh2021
Copy link

Thank you @mesobreira and @bryantbiggs. I will try this solution.

@csepulveda
Copy link

I am getting below error while if i touch/change/comment/update anything on cluster_security_group_description and cluster_security_group_name variables. I just wanted to get a default name and description of sg that is created for EKS by default. I am using version = "~> 18.23.0".

cluster_security_group_description = "Short Description" cluster_security_group_name = local.name_suffix

Error: Get "http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth": dial tcp [::1]:80: connect: connection refused

  with module.eks_cluster.kubernetes_config_map_v1_data.aws_auth[0],
  on .terraform/modules/eks_cluster/main.tf line 443, in resource "kubernetes_config_map_v1_data" "aws_auth":
 443: resource "kubernetes_config_map_v1_data" "aws_auth" {

Any solution for this?

Thanks!

Same issue here, i could create without any issue the clusters and modify it.
But after a few hours i got the same error.

I already try a lot of changes.
Use data.
Use module output
Use exec command

Always the same issue.

│ Error: Get "http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth": dial tcp [::1]:80: connect: connection refused

│ with module.eks.kubernetes_config_map_v1_data.aws_auth[0],
│ on .terraform/modules/eks/main.tf line 475, in resource "kubernetes_config_map_v1_data" "aws_auth":
│ 475: resource "kubernetes_config_map_v1_data" "aws_auth" {

@mesobreira
Copy link

@csepulveda, have you tried to use external data source, as I mentioned above ?

@VladoPortos
Copy link

I really do not understand what the issue is with terraform.

provider "kubernetes" {
  host = data.aws_eks_cluster.cluster.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority[0].data)

  exec {
    api_version = "client.authentication.k8s.io/v1beta1"
    command     = "aws"
    # This requires the awscli to be installed locally where Terraform is executed
    args = ["eks", "get-token", "--cluster-name", var.cluster_name]
  }
}

Using the data will not provide the information to the provider, despite the information clearly are in state file and are correct.
Had to switch it to module.eks.cluster_endpoint and module.eks.cluster_certificate_authority_data

Why the variables are not provided to provider ??

terraform -version
Terraform v1.3.3
on linux_amd64
+ provider registry.terraform.io/gavinbunney/kubectl v1.14.0
+ provider registry.terraform.io/hashicorp/aws v4.37.0
+ provider registry.terraform.io/hashicorp/cloudinit v2.2.0
+ provider registry.terraform.io/hashicorp/helm v2.7.1
+ provider registry.terraform.io/hashicorp/kubernetes v2.15.0
+ provider registry.terraform.io/hashicorp/local v2.2.3
+ provider registry.terraform.io/hashicorp/null v3.2.0
+ provider registry.terraform.io/hashicorp/random v3.4.3
+ provider registry.terraform.io/hashicorp/template v2.2.0
+ provider registry.terraform.io/hashicorp/time v0.9.0
+ provider registry.terraform.io/hashicorp/tls v4.0.4
+ provider registry.terraform.io/oboukili/argocd v4.1.0
+ provider registry.terraform.io/terraform-aws-modules/http v2.4.1

@github-actions
Copy link

github-actions bot commented Dec 2, 2022

This issue has been automatically marked as stale because it has been open 30 days
with no activity. Remove stale label or comment or this issue will be closed in 10 days

@github-actions github-actions bot added the stale label Dec 2, 2022
@joseph-igb
Copy link

Was getting this error:
Error: Get "http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth": dial tcp [::1]:80: connect: connection refused
Using static values in the data section fixed the error for me. This was my configuration:

data "aws_eks_cluster_auth" "default" {
  name = var.cluster_name
  depends_on =[aws_eks_cluster.cluster]
}
data "aws_eks_cluster" "default" {
  name = var.cluster_name
  depends_on =[aws_eks_cluster.cluster]
}

provider "kubernetes" {
  host                   = data.aws_eks_cluster.default.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.default.certificate_authority[0].data)
  token                  = data.aws_eks_cluster_auth.default.token
}

@sotiriougeorge
Copy link

Was getting this error: Error: Get "http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth": dial tcp [::1]:80: connect: connection refused Using static values in the data section fixed the error for me. This was my configuration:

data "aws_eks_cluster_auth" "default" {
  name = var.cluster_name
  depends_on =[aws_eks_cluster.cluster]
}
data "aws_eks_cluster" "default" {
  name = var.cluster_name
  depends_on =[aws_eks_cluster.cluster]
}

provider "kubernetes" {
  host                   = data.aws_eks_cluster.default.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.default.certificate_authority[0].data)
  token                  = data.aws_eks_cluster_auth.default.token
}

How do you mean static values?

@joseph-igb
Copy link

Was getting this error: Error: Get "http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth": dial tcp [::1]:80: connect: connection refused Using static values in the data section fixed the error for me. This was my configuration:

data "aws_eks_cluster_auth" "default" {
  name = var.cluster_name
  depends_on =[aws_eks_cluster.cluster]
}
data "aws_eks_cluster" "default" {
  name = var.cluster_name
  depends_on =[aws_eks_cluster.cluster]
}

provider "kubernetes" {
  host                   = data.aws_eks_cluster.default.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.default.certificate_authority[0].data)
  token                  = data.aws_eks_cluster_auth.default.token
}

How do you mean static values?

Previously had something along the lines of:

data "aws_eks_cluster_auth" "default" {
  name = aws_eks_cluster.my_cluster.name
}

Based on some of the comments above, decided to use pre-set values so used variables and that got rid of the error.

@github-actions github-actions bot removed the stale label Dec 10, 2022
@stdmje
Copy link

stdmje commented Dec 14, 2022

Same error here using Terragrunt. Everytime i have to upgrade k8s version i have to delete kubernetes_config_map_v1_data.aws_auth[0] from the state otherwise i will get the following error.

kubernetes_config_map_v1_data.aws_auth[0]: Refreshing state... [id=kube-system/aws-auth]
╷
│ Error: Get "http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth": dial tcp [::1]:80: connect: connection refused
│
│   with kubernetes_config_map_v1_data.aws_auth[0],
│   on main.tf line 518, in resource "kubernetes_config_map_v1_data" "aws_auth":
│  518: resource "kubernetes_config_map_v1_data" "aws_auth" {
│
╵

@github-actions
Copy link

This issue has been automatically marked as stale because it has been open 30 days
with no activity. Remove stale label or comment or this issue will be closed in 10 days

@github-actions github-actions bot added the stale label Jan 15, 2023
@github-actions
Copy link

This issue was automatically closed because of stale in 10 days

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 26, 2023
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 25, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests