Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

api: add PoolName and NodeGroupStatus #1018

Merged
merged 3 commits into from
Oct 16, 2024

Conversation

shajmakh
Copy link
Member

@shajmakh shajmakh commented Sep 20, 2024

This PR introduces a new API option for selecting nodes on which the RTE will be running, exposing the nodes' topologies, as a preamble to supporting TAS on HCP (hosted control planes) known as Hypershift.

This PR has two main parts
Part 1: new PoolName field under NodeGroup
Part 2: new NodeGroupStatus type and appending it under the operator
status.
For more details on each part, please look at the corresponding commits.

Signed-off-by: Shereen Haj [email protected]

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 20, 2024
@shajmakh shajmakh changed the title API: enable NodeSelector under NodeGroup WIP: API: enable NodeSelector under NodeGroup Sep 20, 2024
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 20, 2024
Copy link
Member

@ffromani ffromani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use some trajectory adjustements, but looks a nice preparation step and I agree with the general direction

Copy link
Member

@ffromani ffromani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partial review

Dockerfile.bundle Outdated Show resolved Hide resolved
api/numaresourcesoperator/v1/helper/nodegroup/nodegroup.go Outdated Show resolved Hide resolved
@ffromani
Copy link
Member

I like the new direction.
Something that bothers me is that on HCP we're actually breaking backward compatibility, technically speaking, because we are not populating previously-mandatory fields.

OTOH HCP is different enough that smooth 1:1 transition from OCP is not possible (lacking machineconfigpools for example) so this is probably OK, and not something we can fix anyway.

@shajmakh shajmakh force-pushed the enh-p2 branch 5 times, most recently from 5afa4d3 to 43b13ff Compare September 25, 2024 11:29
@ffromani
Copy link
Member

the security lane failure is real, the other are due to a known issue @Tal-or is already working on

@shajmakh shajmakh force-pushed the enh-p2 branch 2 times, most recently from 4e7230d to 6479c28 Compare September 26, 2024 07:15
@shajmakh shajmakh changed the title WIP: API: enable NodeSelector under NodeGroup API: enable NodeSelector under NodeGroup Sep 26, 2024
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 26, 2024
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 26, 2024
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 26, 2024
Copy link
Member

@ffromani ffromani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partial review, but looks nice.

api/numaresourcesoperator/v1/helper/nodegroup/nodegroup.go Outdated Show resolved Hide resolved
pkg/validation/validation.go Outdated Show resolved Hide resolved
pkg/validation/validation.go Outdated Show resolved Hide resolved
pkg/validation/validation.go Outdated Show resolved Hide resolved
pkg/validation/validation.go Show resolved Hide resolved
pkg/validation/validation.go Show resolved Hide resolved
controllers/numaresourcesoperator_controller.go Outdated Show resolved Hide resolved
pkg/objectnames/objectnames.go Outdated Show resolved Hide resolved
@ffromani ffromani changed the title API: enable NodeSelector under NodeGroup api: add PoolName and NodeGroupStatus Sep 27, 2024
pkg/validation/validation.go Outdated Show resolved Hide resolved
pkg/validation/validation.go Outdated Show resolved Hide resolved
Copy link
Member

@ffromani ffromani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm inclined to leave all the fields but PoolName optional in the NodeGroupStatus even if we will always populate them. There are pending comments to address, so let's think one last time about it before to make the final decision and than merge.

Changing from/to pointer to value fields should be a quick change, so hopefully it won't be too much churn if we change direction.

Everything else LGTM. Once the few pending comments are addresed and once we settle the status fields types conversation, we can merge.

@shajmakh
Copy link
Member Author

shajmakh commented Oct 3, 2024

I'm inclined to leave all the fields but PoolName optional in the NodeGroupStatus even if we will always populate them. There are pending comments to address, so let's think one last time about it before to make the final decision and than merge.

I agree with this direction (i.e. making all NodeGroupStatus fields as optional except the PoolName) for mainly below two reasons:

  1. if we set a NodeGroupStatus it has to have a PoolName which we in fact copy from the MCPs status
  2. having a PoolName set doesn't assume that the config and the daemonset would be set, because according to the reconcile loop this would require the MC to be updated as the resource of the corresponding MCP.

@shajmakh shajmakh force-pushed the enh-p2 branch 2 times, most recently from 1b93e5d to 3f25cc6 Compare October 3, 2024 12:18
Copy link
Member

@ffromani ffromani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove WIP from the PR title?
possibly the last comment inside.
Still making my mind about the status, reviewing the recommendations

api/numaresourcesoperator/v1/helper/nodegroup/nodegroup.go Outdated Show resolved Hide resolved
@shajmakh shajmakh changed the title WIP: api: add PoolName and NodeGroupStatus api: add PoolName and NodeGroupStatus Oct 4, 2024

func syncNodeGroupsStatusPerMCPWithOperatorStatus(instance *nropv1.NUMAResourcesOperator, syncConfig bool) []nropv1.NodeGroupStatus {
ngStatuses := []nropv1.NodeGroupStatus{}
for _, mcp := range instance.Status.MachineConfigPools {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this logic is only valid for OpenShift and we'll add the HyperShift logic later on?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct, this will be handled here based on our conversations to separate the content of the OCP vs HCP:
#1027

@@ -222,29 +222,34 @@ func (r *NUMAResourcesOperatorReconciler) reconcileResourceMachineConfig(ctx con
// It can take a while.
mcpStatuses, allMCPsUpdated := syncMachineConfigPoolsStatuses(instance.Name, trees, r.ForwardMCPConds)
instance.Status.MachineConfigPools = mcpStatuses
instance.Status.NodeGroups = syncNodeGroupsStatusPerMCPWithOperatorStatus(instance, false)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for my lack of understanding, but why do we need to call this function twice (here and in line 232) and one time with false and then with true?
this deserve at least a comment IMO

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch. This deserves a comment and another honest attempt at simplifying the code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. I was following the rules by which the MCP status would get updated, which is only if the MC is updated then update the corresponding NodeGroupConfig in the status, hence the forwarded boolean value.
I agree the code in this area needs can be improved and also adding a way to test the config status update in the controller tests. I'll open an issue to track this work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glancing at the open issues, I see this:#1031 which sounds to me it includes the subject raised here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#1031 partially overlaps, yes. I'll need to write some notes about how the status is meant to be updated. The proposed fix for #1031 would only help partially here though. I think the best way forward is to reorganize the code in this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjusted in the latest updates and updated controller logic when publishing NodeGroupStatus to be after all DSs and MCP are in sync, that is the only way to guarantee that a nodeGroupStatus would have all fields and because in openshift we depend on MCP, this will need to be updated to adapt HCP. please let me know if this stands on both your expectations. Thanks!

Copy link
Member

@ffromani ffromani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed again https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md and come to a conclusion regarding the new API types.

I realized we (mostly me) made some mistakes in the previous API iteration, but this is no reason to keep making the same mistakes. Let's do better from now on.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 10, 2024
This PR introduces a new API option for selecting nodes on which the RTE will be running, exposing the nodes' topologies, as a preamble to supporting TAS on HCP (hosted control planes) known as Hypershift.

On OCP:
 The way that cluster nodes are grouped is by using Machine Config Pools (MCPs); On the operator side, numaresourcesoperator CR defines an MCP selector that behind the scenes uses the node selector on the corresponding MCP and sets it under the RTE daemonset's node selector.

On HCP:
 The MCP term does not exist on hypershift. Instead, the hypershift platform has a node-pool, which essentially groups the nodes based on a specific node-pool label that is added under the node's labels.

This PR proposes to enable additional options for selecting node groups to run the RTE pods that will work on both OCP and HCP platforms. The new API option called `PoolName` does not change or affect how the MCP selector is processed; It provides a new way for OCP to address the nodes by simply setting the name of the MCP as PoolName, which is equivalent to setting the MachineConfigPoolSelector to a label matching the desired MCP. The new field is optional; thus, this solution is backward compatible.

On HCP, PoolName represents the node pool name to which the nodes belong and on which the user desires to operate the operator (RTE daemonset). Tackling HCP platform is yet to be fully supported and it requires additional
modifications.

Notes & restrictions:
- PoolName represents a single string defining the name of the pool, be
  it MCP name on OCP or NodePool name on HCP.
- Only one pool specifier should be specified per node group; more than one will not be tolerated and will cause a degraded state.
- On OCP where both options can be set, the operator state will be degraded if PoolName of one node group and MachineConfigPoolSelector of another node group points to the same MCP.
- Apart from the aforementioned, no extra validations are applied to determine whether nodes of one node group correlate with nodes from another node group. The user takes responsibility for providing nonconflicting nodes per selector group.
- As for MCP selectors, the selected nodes are not validated for whether or not they have the correct machine configuration needed for TAS to be operational.

Signed-off-by: Shereen Haj <[email protected]>
So far, tracking the node groups' statuses has been done via the collective operator status, which contains a list of all affected MCPs and their matching RTE daemonsets list.

This part aims to enable the status per node group to be populated in a single node group status wrapper instead of an accumulative operator status. For that we need to link the MCPs to a node group for that we use the mcp name.The new NodeGroupStatus consists of Daemonset, NodeGroupConfig & PoolName. It is known that there is a daemonset per MCP not per NodeGroup (see https://github.com/openshift-kni/numaresources-operator/pull/1020/files). So we allow tracking each single matching MCP group config status in a single NodeGroupStatus, without breaking backward compatibility and while making a base for future plans.
We keep populating the statuses in NUMAResourcesOperatorStatus fields to retain API backward compatibility, and additionally, we start reflecting the status per node pool (MCP or NodePool later in HCP). The relation between the current node group MCP and daemon sets and the new representation is 1:1, and there is no change in the functionality.

Signed-off-by: Shereen Haj <[email protected]>
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 16, 2024
Extend controller tests to cover the PoolName scenarios.

Signed-off-by: Shereen Haj <[email protected]>
@shajmakh
Copy link
Member Author

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 16, 2024
@shajmakh
Copy link
Member Author

failed to pull upi-installer image
/retest

Copy link
Member

@ffromani ffromani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 16, 2024
Copy link
Contributor

openshift-ci bot commented Oct 16, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ffromani, shajmakh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@shajmakh
Copy link
Member Author

/unhold

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 16, 2024
@ffromani
Copy link
Member

/retest

the candidate fix should have been merged few mins ago

@openshift-merge-bot openshift-merge-bot bot merged commit 69033d1 into openshift-kni:main Oct 16, 2024
14 checks passed
@shajmakh
Copy link
Member Author

@ffromani could you please share what fixed the CI issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants