-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/add cluster stacks plugin #800
base: main
Are you sure you want to change the base?
Conversation
2d37443
to
eefc02f
Compare
Tests/kaas/plugin/README.md
Outdated
|
||
### Configuring clusterspec.yaml file | ||
|
||
The `clusterspec.yaml` file is used to set parameters for creating a Kubernetes cluster with the `cluster-stacks` plugin. This file allows you to specify details related to the cluster-stack, Git integration, and cluster configurations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The clusterspec.yaml
only contains information that is not specific to the plugin. Information that is specific to the test subject (i.e., the cloud under test) should be defined somewhere else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, okay. Then we need to think about some new config file where will be defined parameters/variables specific to the plugins (kind, cluster-stacks, etc.) or we can say that the user/tester needs to define some env variables before executing the plugin. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plugin's constructor will take the path of a config file, and that config file should contain everything that is specific to the plugin and the test subject (it could in turn refer to other files, if necessary). The path of the config file can be specified per subject in config.toml.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought clusterspec.yaml
was the config file, as it was the only argument (called CLUSTERSPEC_PATH
) you could pass to the plugin. However, it seems that it was changed in yesterday's commits. I will take a closer look into that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It now take an additional configuration file specific to the plugin. I have also added an example file, see plugin-cluster-stacks-config.yaml
.
And I have one question regarding this. How should the plugin configuration file look when you want to provision multiple clusters? Do you need a separate configuration file for each cluster, or is a single file sufficient, with the clusterspec_cluster
variable determining the cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my mind, the plugin will merge the plugin configuration with the clusterspec information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it should be merged based on the clusterspec_cluster
. In my mind, the config file for the plugin would then look something like this:
current-k8s-release:
clouds_yaml_path: "~/.config/openstack/clouds.yaml" # Path to the OpenStack clouds.yaml file
cs_name: "scs" # Cluster Stack Name
cs_version: "v1" # Cluster Stack Version
# Additional configurations
...
...
...
current-k8s-release-1:
...
...
...
Tests/kaas/plugin/README.md
Outdated
|
||
```yaml | ||
cs_name: <cs_name> # Default: "scs" | ||
cs_k8s_version: <cs_k8s_version> # Default: "1.29" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be taken from the branch (and the branch will be supplied to the plugin; the plugin need not read the clusterspec.yaml)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree that the cs_k8s_version
should be taken from the branch parameter in the clusterspec.yaml
file.
Tests/kaas/plugin/plugin_kind.py
Outdated
@@ -21,7 +21,7 @@ def _create_cluster(self): | |||
if self.config is None: | |||
self.cluster.create() | |||
else: | |||
self.cluster.create(self.config) | |||
self.cluster.create() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change? It has nothing to do with cluster stacks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is currently outdated, as the code in #753 has also been modified.
Tests/kaas/plugin/run_plugin.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to be rebased on the latest version of the target branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rebased onto the latest version
f4bdba7
to
e5952c0
Compare
If this has been rebased to the latest version, why then does it have merge conflicts? |
As you can see, I added that comment on 11.11.2024 at 10:37 a.m., and then, after rebasing, a new commit was added to the target branch later that afternoon. |
e5952c0
to
b5bb995
Compare
Understood. Thanks for doing all the great work. Please work with Toni to get this merged asap. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few observations
raise TimeoutError(f"Timed out after {timeout} seconds waiting for CAPI and CAPO system pods to become ready.") | ||
|
||
|
||
def wait_for_cso_pods_ready(timeout=240, interval=15): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a variation of the function above (if not literally copy and paste), only with different namespaces
. The function above should take namespaces
as parameter, renamed wait_for_pods
, and then wait_for_cso_pods_ready
and wait_for_capi_pods_ready
be implemented as a simple call to wait_for_pods
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try to merge these two functions into the one wait_for_pods
raise TimeoutError(f"Timed out after {timeout} seconds waiting for CSO pods in 'cso-system' namespace to become ready.") | ||
|
||
|
||
def wait_for_workload_pods_ready(namespace="kube-system", timeout=600, kubeconfig_path=None, max_retries=3, delay=30): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting that this function works quite differently from the ones above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, here I want to check whether all pods in the kube-system namespace on the workload cluster are ready. This is the final step to ensure that the cluster and all its necessary components are deployed correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this purpose, the wait_for_pods
function is now used.
# Cluster Stack Parameters | ||
self.clouds_yaml_path = self.config.get('clouds_yaml_path', '~/.config/openstack/clouds.yaml') | ||
self.cs_k8s_version = self.cluster_version | ||
self.cs_name = self.config.get('cs_name', 'scs') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we really need to hard-code all these defaults (which I doubt because we deliver an example file, and this thing won't work without a config file), then the defaults should be put into a dict, for instance
DEFAULTS = {
'cs_name': 'scs',
'cs_version': 'v1',
...
}
Then I would do
for key, value in DEFAULTS.items():
self.config.setdefault(key, value) # will only set if not already present
Also, for the environment variables, I think the detour with the instance variables is unnecessary.
You can do just
required_env = {key.upper(): value for key, value in self.config}
Of course, this is not quite correct yet, because some values have to be computed differently, and not all keys from the config would be used. For the first point, we can always amend required_env
, and for the second, we can add a guard to the dict comprehension like so:
required_env = {key.upper(): value for key, value in self.config if key in ENV_KEYS}
where ENV_KEYS
would be a global variable of type set
, such as
# don't use uppercase here because this refers to config.yaml keys
ENV_KEYS = {'cs_name', 'cs_version', ...}
setup_environment_variables(self) | ||
setup_git_env(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should use os.environ.update
to update the environment of this process. We should construct the dict with the updates that we need and then pass this dict to subprocess.run
(well possible with self._run_subprocess
).
except subprocess.CalledProcessError as error: | ||
raise RuntimeError(f"{error_msg}: {error}") | ||
|
||
def _get_kubeadm_control_plane_name(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks like a static method (no reference of self), but that might change once we stop polluting os.environ
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this function, I am trying to get the name of the kubeadmcontrolplane
resource because it is generated dynamically. In the next function, _wait_kcp_ready
, the name of the kcp
(kubeadmcontrolplane) resource is used to check whether it is ready. Once confirmed, you can proceed to retrieve the kubeconfig of the workload cluster.
else: | ||
raise RuntimeError("Failed to get kubeadmcontrolplane name") | ||
|
||
def _wait_kcp_ready(self, kcp_name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
if kubeconfig_filepath: | ||
shutil.move(self.kubeconfig, kubeconfig_filepath) | ||
else: | ||
kubeconfig_filepath = str(self.kubeconfig) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure that this is not the kubeconfig that we want to provide here, because this seems to be the management (Kind) cluster, not the workload cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also discussed this with @tonifinger today, and I will make the change. We agreed that the name/path of the kubeconfig file will be sourced from the clusterspec.yaml
file from field kubeconfig
. And the kubeconfig file for the management (kind) cluster will be named something like kubeconfig-mngmt
.
kubeconfig_cs_cluster_filename = f"kubeconfig-{cluster_name}.yaml" | ||
try: | ||
# Check if the cluster exists | ||
check_cluster_command = f"kubectl get cluster {cluster_name} --kubeconfig {kubeconfig_filepath}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you meant
check_cluster_command = f"kubectl get cluster {cluster_name} --kubeconfig {kubeconfig_filepath}" | |
check_cluster_command = f"kubectl get cluster {cluster_name} --kubeconfig {kubeconfig_cs_cluster_filename}" |
here because kubeconfig_filepath
refers to the Kind cluster that you delete further down
(but keep in mind my comment in create_cluster
that kubeconfig_filepath
actually SHOULD point to the workload cluster!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this command, I want to check if the cluster resource exists, which is located in the management cluster(in our case it is kind now)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry, you're right. This is a custom resource, right? Got it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is a custom resource.
After looking at this PR in more detail, I think it does need some more work. Nothing seems to have happened today after 9 am CET, which baffles me a bit, since you had put a thumbs up under my remark that went
|
I see now that you requested a review from Toni at 12:25 or so. If I say "work with XY asap", I think requesting a review via a button in a UI does not quite fit the bill... |
@michal-gubricky @tonifinger Please communicate proactively (using Matrix or whatever suits you) to get this done ASAP, ideally until tomorrow (November 14) at noon. |
Sorry, one more thing, but now back to the subject matter at hand. If I understand correctly, the plugin deploys two clusters
Well, at least roughly. To me, this seems like a lot of work, and it doesn't feel like Kubernetes as a service at all. Not in the least. Where is the service if you have to download and invoke Helm charts? Shouldn't a service come with something like an API? Are we doing something wrong here??? |
My impression (after talking to Toni) is that we shouldn't set up the management cluster ourselves. The management cluster should be ready to use and the corresponding Kubeconfig file provided to us by our partner. I have written to the Team Container Matrix channel to get confirmation on that. |
First of all, this is not true. I just did not assign Toni. We have already had a few conversations about it. I asked him if he could take a look to see if anything was missing from his perspective. He also mentioned that you were planning to store the config file under playbooks/k8s_configs/." That's why I this commit. |
Thanks for 2nd round of review, I will take a look at it. |
You're right, and I apologize. First, you did also ping him, and second, you committed something after 9 am. But my overall point still stands, because I don't see any change after that commit. BTW, I don't know went wrong, but you shouldn't have moved the file. Yes, I do plan to have the config files in the playbooks directory, but only real config files from our partners, but not templates or illustrative examples. |
In short, cluster stacks work in such a way that you need a management cluster. For this purpose, you can use a simple kind cluster or some already existing k8s cluster. In that mgmt cluster, you need to deploy the necessary components. After that, you just apply those two templates |
I don't mean to blame you personally. Maybe Toni was unresponsive. He says he will try to set up a call so that you can indeed work together in a focused manner. |
It is totally okay.
Ahh sorry, I do not know this part that it is place only for the config files from our partners. |
Removed the return of the kubeconfig filepath from the `create_cluster` method as we do not use this handling. Signed-off-by: Toni Finger <[email protected]>
Signed-off-by: Toni Finger <[email protected]>
Signed-off-by: michal.gubricky <[email protected]>
Signed-off-by: michal.gubricky <[email protected]>
Signed-off-by: michal.gubricky <[email protected]>
Signed-off-by: michal.gubricky <[email protected]>
Signed-off-by: michal.gubricky <[email protected]>
Signed-off-by: michal.gubricky <[email protected]>
Signed-off-by: michal.gubricky <[email protected]>
… update README Signed-off-by: michal.gubricky <[email protected]>
Signed-off-by: michal.gubricky <[email protected]>
Signed-off-by: michal.gubricky <[email protected]>
…oad cluster Signed-off-by: michal.gubricky <[email protected]>
Signed-off-by: michal.gubricky <[email protected]>
Co-authored-by: Matthias Büchse <[email protected]> Signed-off-by: Michal Gubricky <[email protected]>
Signed-off-by: michal.gubricky <[email protected]>
Co-authored-by: Matthias Büchse <[email protected]> Signed-off-by: Michal Gubricky <[email protected]>
Signed-off-by: michal.gubricky <[email protected]>
Signed-off-by: michal.gubricky <[email protected]>
…r pods Signed-off-by: michal.gubricky <[email protected]>
Signed-off-by: michal.gubricky <[email protected]>
Co-authored-by: tonifinger <[email protected]> Signed-off-by: Michal Gubricky <[email protected]>
Signed-off-by: michal.gubricky <[email protected]>
Signed-off-by: michal.gubricky <[email protected]>
Co-authored-by: tonifinger <[email protected]> Signed-off-by: Michal Gubricky <[email protected]>
Signed-off-by: michal.gubricky <[email protected]>
a72080e
to
70f580c
Compare
Signed-off-by: michal.gubricky <[email protected]>
This PR implement plugin for cluster-stacks
Fixes #774