Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/add cluster stacks plugin #800

Open
wants to merge 49 commits into
base: main
Choose a base branch
from

Conversation

michal-gubricky
Copy link
Contributor

@michal-gubricky michal-gubricky commented Nov 2, 2024

This PR implement plugin for cluster-stacks

Fixes #774

Tests/kaas/plugin/.pytest-kind/my-cluster/kind-v0.17.0 Outdated Show resolved Hide resolved

### Configuring clusterspec.yaml file

The `clusterspec.yaml` file is used to set parameters for creating a Kubernetes cluster with the `cluster-stacks` plugin. This file allows you to specify details related to the cluster-stack, Git integration, and cluster configurations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The clusterspec.yaml only contains information that is not specific to the plugin. Information that is specific to the test subject (i.e., the cloud under test) should be defined somewhere else.

Copy link
Contributor Author

@michal-gubricky michal-gubricky Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, okay. Then we need to think about some new config file where will be defined parameters/variables specific to the plugins (kind, cluster-stacks, etc.) or we can say that the user/tester needs to define some env variables before executing the plugin. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plugin's constructor will take the path of a config file, and that config file should contain everything that is specific to the plugin and the test subject (it could in turn refer to other files, if necessary). The path of the config file can be specified per subject in config.toml.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought clusterspec.yaml was the config file, as it was the only argument (called CLUSTERSPEC_PATH) you could pass to the plugin. However, it seems that it was changed in yesterday's commits. I will take a closer look into that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It now take an additional configuration file specific to the plugin. I have also added an example file, see plugin-cluster-stacks-config.yaml.

And I have one question regarding this. How should the plugin configuration file look when you want to provision multiple clusters? Do you need a separate configuration file for each cluster, or is a single file sufficient, with the clusterspec_cluster variable determining the cluster?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my mind, the plugin will merge the plugin configuration with the clusterspec information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it should be merged based on the clusterspec_cluster. In my mind, the config file for the plugin would then look something like this:

current-k8s-release:
  clouds_yaml_path: "~/.config/openstack/clouds.yaml"   # Path to the OpenStack clouds.yaml file
  cs_name: "scs"                                        # Cluster Stack Name
  cs_version: "v1"                                      # Cluster Stack Version
  # Additional configurations
  ...
  ...
  ...

current-k8s-release-1:
  ...
  ...
  ...

Tests/kaas/plugin/README.md Outdated Show resolved Hide resolved

```yaml
cs_name: <cs_name> # Default: "scs"
cs_k8s_version: <cs_k8s_version> # Default: "1.29"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be taken from the branch (and the branch will be supplied to the plugin; the plugin need not read the clusterspec.yaml)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree that the cs_k8s_version should be taken from the branch parameter in the clusterspec.yaml file.

Tests/kaas/plugin/kubeconfig-cs-cluster Outdated Show resolved Hide resolved
Tests/kaas/plugin/cluster.yaml Outdated Show resolved Hide resolved
@@ -21,7 +21,7 @@ def _create_cluster(self):
if self.config is None:
self.cluster.create()
else:
self.cluster.create(self.config)
self.cluster.create()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change? It has nothing to do with cluster stacks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is currently outdated, as the code in #753 has also been modified.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to be rebased on the latest version of the target branch

Copy link
Contributor Author

@michal-gubricky michal-gubricky Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebased onto the latest version

@michal-gubricky michal-gubricky force-pushed the feat/add-cluster-stacks-plugin branch 3 times, most recently from f4bdba7 to e5952c0 Compare November 11, 2024 09:34
@michal-gubricky michal-gubricky marked this pull request as ready for review November 12, 2024 10:03
@mbuechse
Copy link
Contributor

If this has been rebased to the latest version, why then does it have merge conflicts?

@michal-gubricky
Copy link
Contributor Author

If this has been rebased to the latest version, why then does it have merge conflicts?

As you can see, I added that comment on 11.11.2024 at 10:37 a.m., and then, after rebasing, a new commit was added to the target branch later that afternoon.

@michal-gubricky michal-gubricky force-pushed the feat/add-cluster-stacks-plugin branch from e5952c0 to b5bb995 Compare November 13, 2024 07:55
@mbuechse
Copy link
Contributor

As you can see, I added that comment on 11.11.2024 at 10:37 a.m., and then, after rebasing, a new commit was added to the target branch later that afternoon.

Understood. Thanks for doing all the great work. Please work with Toni to get this merged asap.

Copy link
Contributor

@mbuechse mbuechse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few observations

raise TimeoutError(f"Timed out after {timeout} seconds waiting for CAPI and CAPO system pods to become ready.")


def wait_for_cso_pods_ready(timeout=240, interval=15):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a variation of the function above (if not literally copy and paste), only with different namespaces. The function above should take namespaces as parameter, renamed wait_for_pods, and then wait_for_cso_pods_ready and wait_for_capi_pods_ready be implemented as a simple call to wait_for_pods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to merge these two functions into the one wait_for_pods

raise TimeoutError(f"Timed out after {timeout} seconds waiting for CSO pods in 'cso-system' namespace to become ready.")


def wait_for_workload_pods_ready(namespace="kube-system", timeout=600, kubeconfig_path=None, max_retries=3, delay=30):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting that this function works quite differently from the ones above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, here I want to check whether all pods in the kube-system namespace on the workload cluster are ready. This is the final step to ensure that the cluster and all its necessary components are deployed correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this purpose, the wait_for_pods function is now used.

Tests/kaas/plugin/plugin_cluster_stacks.py Outdated Show resolved Hide resolved
# Cluster Stack Parameters
self.clouds_yaml_path = self.config.get('clouds_yaml_path', '~/.config/openstack/clouds.yaml')
self.cs_k8s_version = self.cluster_version
self.cs_name = self.config.get('cs_name', 'scs')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we really need to hard-code all these defaults (which I doubt because we deliver an example file, and this thing won't work without a config file), then the defaults should be put into a dict, for instance

DEFAULTS = {
    'cs_name': 'scs',
    'cs_version': 'v1',
    ...
}

Then I would do

for key, value in DEFAULTS.items():
    self.config.setdefault(key, value)   # will only set if not already present

Also, for the environment variables, I think the detour with the instance variables is unnecessary.

You can do just

required_env = {key.upper(): value for key, value in self.config}

Of course, this is not quite correct yet, because some values have to be computed differently, and not all keys from the config would be used. For the first point, we can always amend required_env, and for the second, we can add a guard to the dict comprehension like so:

required_env = {key.upper(): value for key, value in self.config if key in ENV_KEYS}

where ENV_KEYS would be a global variable of type set, such as

# don't use uppercase here because this refers to config.yaml keys
ENV_KEYS = {'cs_name', 'cs_version', ...}

Tests/kaas/plugin/plugin_cluster_stacks.py Outdated Show resolved Hide resolved
Comment on lines 232 to 233
setup_environment_variables(self)
setup_git_env(self)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should use os.environ.update to update the environment of this process. We should construct the dict with the updates that we need and then pass this dict to subprocess.run (well possible with self._run_subprocess).

except subprocess.CalledProcessError as error:
raise RuntimeError(f"{error_msg}: {error}")

def _get_kubeadm_control_plane_name(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like a static method (no reference of self), but that might change once we stop polluting os.environ

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this function, I am trying to get the name of the kubeadmcontrolplane resource because it is generated dynamically. In the next function, _wait_kcp_ready, the name of the kcp (kubeadmcontrolplane) resource is used to check whether it is ready. Once confirmed, you can proceed to retrieve the kubeconfig of the workload cluster.

else:
raise RuntimeError("Failed to get kubeadmcontrolplane name")

def _wait_kcp_ready(self, kcp_name):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Comment on lines 239 to 242
if kubeconfig_filepath:
shutil.move(self.kubeconfig, kubeconfig_filepath)
else:
kubeconfig_filepath = str(self.kubeconfig)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure that this is not the kubeconfig that we want to provide here, because this seems to be the management (Kind) cluster, not the workload cluster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also discussed this with @tonifinger today, and I will make the change. We agreed that the name/path of the kubeconfig file will be sourced from the clusterspec.yaml file from field kubeconfig. And the kubeconfig file for the management (kind) cluster will be named something like kubeconfig-mngmt.

kubeconfig_cs_cluster_filename = f"kubeconfig-{cluster_name}.yaml"
try:
# Check if the cluster exists
check_cluster_command = f"kubectl get cluster {cluster_name} --kubeconfig {kubeconfig_filepath}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you meant

Suggested change
check_cluster_command = f"kubectl get cluster {cluster_name} --kubeconfig {kubeconfig_filepath}"
check_cluster_command = f"kubectl get cluster {cluster_name} --kubeconfig {kubeconfig_cs_cluster_filename}"

here because kubeconfig_filepath refers to the Kind cluster that you delete further down

(but keep in mind my comment in create_cluster that kubeconfig_filepath actually SHOULD point to the workload cluster!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this command, I want to check if the cluster resource exists, which is located in the management cluster(in our case it is kind now)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry, you're right. This is a custom resource, right? Got it!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is a custom resource.

@mbuechse
Copy link
Contributor

After looking at this PR in more detail, I think it does need some more work. Nothing seems to have happened today after 9 am CET, which baffles me a bit, since you had put a thumbs up under my remark that went

Please work with Toni to get this merged asap.

@mbuechse
Copy link
Contributor

I see now that you requested a review from Toni at 12:25 or so. If I say "work with XY asap", I think requesting a review via a button in a UI does not quite fit the bill...

@mbuechse
Copy link
Contributor

@michal-gubricky @tonifinger Please communicate proactively (using Matrix or whatever suits you) to get this done ASAP, ideally until tomorrow (November 14) at noon.

@mbuechse
Copy link
Contributor

mbuechse commented Nov 13, 2024

Sorry, one more thing, but now back to the subject matter at hand. If I understand correctly, the plugin deploys two clusters

  • a management cluster on Kind
  • the workload cluster on OpenStack

Well, at least roughly. To me, this seems like a lot of work, and it doesn't feel like Kubernetes as a service at all. Not in the least. Where is the service if you have to download and invoke Helm charts? Shouldn't a service come with something like an API? Are we doing something wrong here???

@mbuechse
Copy link
Contributor

My impression (after talking to Toni) is that we shouldn't set up the management cluster ourselves. The management cluster should be ready to use and the corresponding Kubeconfig file provided to us by our partner. I have written to the Team Container Matrix channel to get confirmation on that.

@michal-gubricky
Copy link
Contributor Author

After looking at this PR in more detail, I think it does need some more work. Nothing seems to have happened today after 9 am CET, which baffles me a bit, since you had put a thumbs up under my remark that went

I see now that you requested a review from Toni at 12:25 or so. If I say "work with XY asap", I think requesting a review via a button in a UI does not quite fit the bill...

First of all, this is not true. I just did not assign Toni. We have already had a few conversations about it. I asked him if he could take a look to see if anything was missing from his perspective. He also mentioned that you were planning to store the config file under playbooks/k8s_configs/." That's why I this commit.

@michal-gubricky
Copy link
Contributor Author

Thanks for 2nd round of review, I will take a look at it.

@mbuechse
Copy link
Contributor

First of all, this is not true. I just did not assign Toni. We have already had a few conversations about it. I asked him if he could take a look to see if anything was missing from his perspective. He also mentioned that you were planning to store the config file under playbooks/k8s_configs/." That's why I this commit.

You're right, and I apologize. First, you did also ping him, and second, you committed something after 9 am.

But my overall point still stands, because I don't see any change after that commit.

BTW, I don't know went wrong, but you shouldn't have moved the file. Yes, I do plan to have the config files in the playbooks directory, but only real config files from our partners, but not templates or illustrative examples.

@michal-gubricky
Copy link
Contributor Author

Sorry, one more thing, but now back to the subject matter at hand. If I understand correctly, the plugin deploys two clusters

  • a management cluster on Kind
  • the workload cluster on OpenStack

Well, at least roughly. To me, this seems like a lot of work, and it doesn't feel like Kubernetes as a service at all. Not in the least. Where is the service if you have to download and invoke Helm charts? Shouldn't a service come with something like an API? Are we doing something wrong here???

My impression (after talking to Toni) is that we shouldn't set up the management cluster ourselves. The management cluster should be ready to use and the corresponding Kubeconfig file provided to us by our partner. I have written to the Team Container Matrix channel to get confirmation on that.

In short, cluster stacks work in such a way that you need a management cluster. For this purpose, you can use a simple kind cluster or some already existing k8s cluster. In that mgmt cluster, you need to deploy the necessary components. After that, you just apply those two templates clusterstack.yaml and cluster.yaml.

@mbuechse
Copy link
Contributor

I don't mean to blame you personally. Maybe Toni was unresponsive. He says he will try to set up a call so that you can indeed work together in a focused manner.

@michal-gubricky
Copy link
Contributor Author

First of all, this is not true. I just did not assign Toni. We have already had a few conversations about it. I asked him if he could take a look to see if anything was missing from his perspective. He also mentioned that you were planning to store the config file under playbooks/k8s_configs/." That's why I this commit.

You're right, and I apologize. First, you did also ping him, and second, you committed something after 9 am.

It is totally okay.

But my overall point still stands, because I don't see any change after that commit.

BTW, I don't know went wrong, but you shouldn't have moved the file. Yes, I do plan to have the config files in the playbooks directory, but only real config files from our partners, but not templates or illustrative examples.

Ahh sorry, I do not know this part that it is place only for the config files from our partners.

tonifinger and others added 26 commits November 21, 2024 11:09
Removed the return of the kubeconfig filepath from the `create_cluster`
method as we do not use this handling.

Signed-off-by: Toni Finger <[email protected]>
Signed-off-by: michal.gubricky <[email protected]>
Signed-off-by: michal.gubricky <[email protected]>
Signed-off-by: michal.gubricky <[email protected]>
Signed-off-by: michal.gubricky <[email protected]>
Signed-off-by: michal.gubricky <[email protected]>
Co-authored-by: Matthias Büchse <[email protected]>
Signed-off-by: Michal Gubricky <[email protected]>
Co-authored-by: Matthias Büchse <[email protected]>
Signed-off-by: Michal Gubricky <[email protected]>
Signed-off-by: michal.gubricky <[email protected]>
Co-authored-by: tonifinger <[email protected]>
Signed-off-by: Michal Gubricky <[email protected]>
Signed-off-by: michal.gubricky <[email protected]>
Co-authored-by: tonifinger <[email protected]>
Signed-off-by: Michal Gubricky <[email protected]>
Signed-off-by: michal.gubricky <[email protected]>
@michal-gubricky michal-gubricky force-pushed the feat/add-cluster-stacks-plugin branch from a72080e to 70f580c Compare November 21, 2024 10:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Implement plugin for cluster-stacks
3 participants