Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout errors in MutatingWebhookConfiguration #46

Closed
krep-dr opened this issue Jan 13, 2020 · 9 comments
Closed

Timeout errors in MutatingWebhookConfiguration #46

krep-dr opened this issue Jan 13, 2020 · 9 comments

Comments

@krep-dr
Copy link

krep-dr commented Jan 13, 2020

In my setup I have Vault server running in one cluster and the Vault-injector in another cluster. I have used the manifest files in /deploy to install the vault-injector.

When a pod is scheduled the MutatingWebhookConfiguration throws timeout errors in a few different flavors and there is nothing to see in the logs for vault-injector.

0s    Warning   FailedCreate   ReplicaSet   Error creating: Internal error occurred: failed calling webhook "vault.hashicorp.com": Post https://vault-agent-injector-svc.vault.svc:443/mutate?timeout=30s: context deadline exceeded
0s    Warning   FailedCreate   ReplicaSet   Error creating: Internal error occurred: failed calling webhook "vault.hashicorp.com": Post https://vault-agent-injector-svc.vault.svc:443/mutate?timeout=30s: context deadline exceeded (Client.Timeout exceeded while awaiting headers)

Deployment variables

 - name: AGENT_INJECT_LISTEN
              value: ":8080"
            - name: AGENT_INJECT_LOG_LEVEL
              value: "debug"
            - name: AGENT_INJECT_VAULT_ADDR
              value: "https://vault.domain<removed>"
            - name: AGENT_INJECT_VAULT_IMAGE
              value: "vault:1.3.1"
            - name: AGENT_INJECT_TLS_AUTO
              value: vault-agent-injector-cfg
            - name: AGENT_INJECT_TLS_AUTO_HOSTS
              value: "vault-agent-injector-svc,vault-agent-injector-svc.$(NAMESPACE),vault-agent-injector-svc.$(NAMESPACE).svc"

Pod logs

2020-01-13T14:38:08.414Z [INFO]  handler: Starting handler..
Listening on ":8080"...
Updated certificate bundle received. Updating certs...

I can call the vault-injector k8s service directly so it seems to be running

I would appreciate if you can point me in the right direction:)
Thanks

@jasonodonnell
Copy link
Contributor

Hi @krep-dr, are you using istio?

@krep-dr
Copy link
Author

krep-dr commented Jan 13, 2020

@jasonodonnell Thanks for looking into this. No I don’t use istio. Will the injector log when receiving a request from the webhook?

@jasonodonnell
Copy link
Contributor

jasonodonnell commented Jan 13, 2020

It will, yes. Is the injector running in the vault namespace? The deploy scripts sets it to that namespace by default.

@krep-dr
Copy link
Author

krep-dr commented Jan 13, 2020

Yes it is

@krep-dr
Copy link
Author

krep-dr commented Jan 14, 2020

A firewall in gcp was blocking the request. It worked when I allowed traffic to port 8080 from the kubernetes master nodes (172.16.0.0/28).
Thanks again for your time:)

@dcatalano-figure
Copy link

dcatalano-figure commented Dec 3, 2020

A firewall in gcp was blocking the request. It worked when I allowed traffic to port 8080 from the kubernetes master nodes (172.16.0.0/28).
Thanks again for your time:)

this one comment saved me tons of time ... using GKE private clusters ... had to create a firewall rule so control plane can communicate with pods directly ... essentially opening up 8080 from source "master address range" otherwise the mutatingwebhookconfiguration prevents all new pods from starting (configerror) because the control plane timeouts on trying to communicate with agent injector pod.

thanks and cheers

@sourcec0de
Copy link

sourcec0de commented Jun 11, 2022

As the other comments suggested the helm chart doesn't know if you have a private GKE cluster.
Enable communication between the control plane to the nodes.

Here's an example in terraform.

resource "google_compute_firewall" "gke-master-to-node" {
  name    = "gke-master-to-node"
  project = "{YOUR_PROJECT_ID}"
  network = "{YOUR_COMPUTE_NETWORK_ID}"
  allow {
    protocol = "all"
   # all ports exposed or ["443"]  
  }
  source_ranges = ["{MASTER_IPV4_CIDR_BLOCK}"] # e.g "10.1.0.0/28"
  target_tags = ["gke-node"] # nodes must be tagged
}

@ricardoamedeiros
Copy link

I had the same problem. I configured group security in aws and now it's working..

@younsl
Copy link

younsl commented Sep 21, 2024

This is a networking configuration issue. I experienced the same symptom in EKS, but solved it.

For the detailed solution, please refer to my comment in #163.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants