-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/cluster gitops #47
base: main
Are you sure you want to change the base?
Changes from all commits
e487611
c5a66b0
d73feb5
67d0a4c
831a648
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,82 @@ | ||
# Proposal for GitOps Style IaC for Cluster Management | ||
|
||
Kurt Garloff, v0.1, 2022-02-18 | ||
|
||
## Motivation and Goals | ||
|
||
Using the Kubernetes Cluster-API (capi), we can use a k8s style declarative | ||
way to describe the workload clusters that should be running and can manage | ||
their lifecycle: creation, changes, rolling upgrades and clean up can all be | ||
performed with it. The OpenStack provider (capo) has the basic integration | ||
to manage the networks, virtual machines, load-balancers. For full automation, | ||
a few more pieces have been developed in the SCS | ||
[k8s-cluster-api-provider](https://github.com/SovereignCloudStack/k8s-cluster-api-provider/) | ||
repository: Registering the images, setting up extra security groups for | ||
non-Calico CNI, creating anti-affinity server groups, deploying the OCCM | ||
and cinder CSI integration and optionally some more services (such as Flux, | ||
cert-manager, nginx ingress, ...) | ||
|
||
This follows similar ideas as described on | ||
<https://www.weave.works/blog/gitops-and-cluster-api-master-of-masters>. | ||
|
||
Most of the simpler cluster setups can be done without ever touching the cluster-template.yaml | ||
file -- just doing a dozen adjustments in clusterctl.yaml provides a reasonable amount | ||
of flexibility. In SCS' k8s-cluster-api-provider setup, the standard settings from | ||
the capo templates have been extended by the settings that let you chose cilium | ||
as alternative CNI provider, anti-affinity, the OCCM and CSI deployment and the extra services. | ||
|
||
To implement a simple gitops style management for a set of clusters, we would | ||
basically create a reconciliation loop on the capi management node, which | ||
gets the enhanced configuration files from git and creates, changes and | ||
destroys clusters according to the settings there. The mechanism should | ||
distinguish between base settings and overlays (kustomization style) to | ||
modify and extend the settings and should allow an opt-in mechanism to | ||
do more detailed adjustments via overriding also the cluster-templates | ||
(kustomization style again). | ||
|
||
Cluster-admin credentials need to be provided to the owners of the cluster. | ||
The current thinking is to have a public key included in the git repo; | ||
the cluster-admin credentials for the created cluster would be encrypted | ||
with this public key and can then be published. (Would still use https | ||
to ensure that these are the true credentials.) Only the owner(s) of | ||
the private key can decrypt the credentials. | ||
|
||
## Implementation thoughts | ||
|
||
The reconciliation loop would roughly look like this: | ||
1. Get the latest clusters from git (via a regular check or an event) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should be done by something like fluxcd, shouldn't it? |
||
1. Per cluster | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure what (and why) exactly is needed here beyond what already happens when one applies a cluster resource. In my mind, the workflow would look like: "install fluxcd, register git repository, put cluster manifests in there, done", so this looks very complex. Am I missing anything? |
||
1. Ensure we have the image available, register if needed | ||
1. Other pre-flight sanity checks (quota, syntax, flavors, inconsistencies, ...) | ||
1. For a new cluster | ||
1. Optionally create a new project (for a new cluster), if so share the image to it | ||
1. Create two application credentials (one for capo, one for OCCM/CSI) | ||
1. Create cilium security group | ||
1. Create anti-affinity server groups (if not disabled) | ||
1. Adjust settings in the cluster-template (cluster-name, sec groups, affinity, ...) | ||
1. Process with clusterctl | ||
1. Submit to capo | ||
1. Wait for control plane readiness | ||
1. For new cluster: Extract cluster-admin creds and encrypt with pubkey | ||
1. Deploy CNI (calico or cilium) -- avoid switching unless forced | ||
1. Deploy OCCM | ||
1. Deploy cinder CSI | ||
1. Deploy metrics service (if not disabled), otherwise remove | ||
1. Sanity checks | ||
1. For all other optional services (nginx, flux, cert-manager, harbor, ...): | ||
1. Deploy service if enabled, otherwise remove (if deployed before) | ||
1. Sanity checks | ||
1. optional CI tests | ||
|
||
## Open questions | ||
|
||
* Can this be integrated into capo or do we really need to create a loop around it? | ||
|
||
* On errors in this loop, we would move to the next cluster. However, how do we handle error reporting? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. on fluxcd: it happen independent on every cluster on the same time (if you have not set an branch, tag (semver parsing included), or gitrepo for each cluster) ... fluxcd has metrics, we use prometheus to get them and fire alerts, if something does not work. |
||
How do we avoid that Operators would need to log in to get capo logs? | ||
|
||
* Can we integrate this with the helm charts work? | ||
|
||
* Can we implement this using flux? | ||
|
||
* How does this relate exactly to <https://github.com/weaveworks/weave-gitops>? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/bitnami-labs/sealed-secrets may be worth looking at for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for SOPS. Flux also has builtin support for it: https://fluxcd.io/docs/guides/mozilla-sops/