Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Machine ID: Teleport Plugins on Kubernetes (Epic) #29048

Closed
6 of 7 tasks
strideynet opened this issue Jul 13, 2023 · 8 comments
Closed
6 of 7 tasks

Machine ID: Teleport Plugins on Kubernetes (Epic) #29048

strideynet opened this issue Jul 13, 2023 · 8 comments
Assignees
Labels
c-crp Internal Customer Reference c-cwv Internal Customer Reference c-dc Internal Customer Reference c-spb Internal Customer Reference epic Epic (a collection of other, related issues) feature-request Used for new features in Teleport, improvements to current should be #enhancements machine-id teleport-plugin Tickets related to Teleport Plugins https://github.com/gravitational/teleport-plugins

Comments

@strideynet
Copy link
Contributor

strideynet commented Jul 13, 2023

Customers often deploy Teleport "plugins" that integrate Teleport with another service. One such example is the Slack integration, which delivers messages to a slack channel when Access Requests are created.

These plugins require authentication against the Teleport API. Historically, this has been done by creating an identity file using tctl auth sign - but this creates a long lived, and potentially powerful, credential. Machine ID is the new "golden path" for short-lived credentials for Machine Access, but, there are several factors preventing Machine ID being used for this:

  1. The access plugins do not support short-lived and rotating credentials. This means that the access plugin processes have to be restarted each time that tbot produces renewed credentials, and this is sub-optimal for reliability.
  2. There is no documentation on this setup.
  3. There is no support in our helm charts (and similar) for this.

Tasks

Preview Give feedback
  1. feature-request machine-id needs-rfd
    strideynet
  2. c-dc feature-request machine-id
    strideynet
  3. c-cs c-fi feature-request machine-id
    strideynet
  4. strideynet
  5. bug machine-id
    strideynet
  6. c-dc documentation machine-id
  7. machine-id
@strideynet strideynet added feature-request Used for new features in Teleport, improvements to current should be #enhancements teleport-plugin Tickets related to Teleport Plugins https://github.com/gravitational/teleport-plugins machine-id epic Epic (a collection of other, related issues) labels Jul 13, 2023
@strideynet strideynet self-assigned this Jul 13, 2023
@strideynet strideynet changed the title Machine ID: Teleport Plugins Support Machine ID: Teleport Plugins Support Epic Jul 13, 2023
@strideynet
Copy link
Contributor Author

A few general questions spring to mind:

@strideynet
Copy link
Contributor Author

Initial discussion with one customer user suggests sidecar is convenient. It also has the advantage of tying the identity directly to the specific access plugin pod - and - avoids integrating with the Kubernetes secret API directly at this time. It will not be able to handle the token join method though - so we will need to complete the support for proper Kubernetes joining.

@webvictim
Copy link
Contributor

webvictim commented Jul 13, 2023

Sidecars are definitely convenient. I do think that a "better" way to do this from a broader perspective though would be to have a dedicated Machine ID Helm chart which runs in its own container and provisions its ServiceAccount with appropriate permissions to write k8s secrets in a given namespace so that other services can read them.

I appreciate that this muddies the water, but maintaining sidecars for 50 different containers that all need to read identities becomes a little cumbersome and would require people to modify the code for their existing third-party setups, whereas running Machine ID completely separately and having it use the k8s secret API for communication is a more scalable, distributed pattern.

With this said, people would likely need to change their deployments to mount the newly-minted secrets into their existing containers anyway so what do I know 😅

@strideynet strideynet changed the title Machine ID: Teleport Plugins Support Epic Machine ID: Teleport Plugins on Kubernetes Epic Jul 13, 2023
@hugoShaka
Copy link
Contributor

hugoShaka commented Jul 13, 2023

My two cents regarding the sidecar vs standalone deployment:

  • the sidecar approach: sidecars are handled very poorly by Kubernetes. This breaks Jobs, Autoscaling, and can harm the whole pod availability when they go down. This also raises scalability questions when mounting MachineID secrets on large deployments (do we really want to run 1 tbot per replica?).
  • the distinct deployment/statefulset: in this scenario, I guess it would write the certs to a secret. Kubernetes has no "rollout pod when secret changes" (although, IIRC mounted files are synced with a 1 or 2 minutes delay). With the secret approach, multiple plugin pods replicas can mount the same secret, but it won't be sync-ed exactly at the same time, leading to two certs from different generations being used simultaneously, which might or might not be an issue. We also need to renew certs ahead of time because of all the delays between certs being renewed and the client loading them.

I would prefer the second approach, I've been hurt numerous times by the sidecar approach, and I think this is a Kubernetes antipattern. This also is a hard blocker for Jobs, I think CronJob and CI-like Jobs are a good fit for MachineID and we want to support them.

@Jasstkn
Copy link

Jasstkn commented Jul 15, 2023

Hi. If i would do it "Kubernetes way", I would implement the controller pattern. Where via CRD we can provide bot configuration for different services (e.g. plugins, custom applications). It can be translated to the cron or regular job application. The Secret is written to different resources for each application to consume because it is bounded with the specific set of permissions by attached role(s).
Operator can also track the executions and provide metrics for the users to track the results and build some observability on top of it (e.g. dashboards, alerts, etc.)

However, this solution might bring new challenges. The mentioned automatic secrets upgrade is described in the Kubernetes documentation.

the total delay from the moment when the Secret is updated to the moment when new keys are projected to the Pod can be as long as the kubelet sync period + cache propagation delay, where the cache propagation delay depends on the chosen cache type (following the same order listed in the previous paragraph, these are: watch propagation delay, the configured cache TTL, or zero for direct polling).

In addition to that, there might be issues for this flow in case of temporary or permanent unavailability of Teleport or Cluster API. This can be mitigated with combination of renewal period and expiration period. However, it's not a silver bullet and the work of consumer application can be easily disrupted in case stale Secret that wasn't updated by the operator or by kubelet because Cluster API wasn't reachable.

I agree with challenges that were mentioned for sidecars containers. But there is an ongoing work to improve its lifecycle within the pod. You can check this KEP to get more details.

@strideynet
Copy link
Contributor Author

I would implement the controller pattern.

A controller/operator is definitely something that interests me down the line - although - I do worry it potentially results in a scenario where the single "bot" controller deployed to the cluster has a wide RBAC grant in order to account for all the potential uses in the cluster.

I agree with challenges that were mentioned for sidecars containers.

Indeed - I think realistically, I'm going to complete the work that will allow sidecar & non-sidecar deployments. We'll then be able to switch to recommending sidecars when the k8s KEP enters GA and the support for sidecars in k8s is less questionable.

@strideynet strideynet changed the title Machine ID: Teleport Plugins on Kubernetes Epic Machine ID: Teleport Plugins on Kubernetes (Epic) Aug 8, 2023
@oshati oshati added the c-spb Internal Customer Reference label Sep 7, 2023
@pschisa pschisa added the c-crp Internal Customer Reference label Oct 4, 2023
@strideynet
Copy link
Contributor Author

#33028 has arisen as necessary

@pschisa pschisa added the c-cwv Internal Customer Reference label Oct 18, 2023
@strideynet
Copy link
Contributor Author

I'm closing this epic as complete as the actual implementation work is now done.

Documentation and extending the helm charts to take advantage of this remains - but I'll leave those as separate items of work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c-crp Internal Customer Reference c-cwv Internal Customer Reference c-dc Internal Customer Reference c-spb Internal Customer Reference epic Epic (a collection of other, related issues) feature-request Used for new features in Teleport, improvements to current should be #enhancements machine-id teleport-plugin Tickets related to Teleport Plugins https://github.com/gravitational/teleport-plugins
Projects
None yet
Development

No branches or pull requests

6 participants