Skip to content

Commit

Permalink
api: introduce NodeGroupStatus
Browse files Browse the repository at this point in the history
So far, tracking the node groups' statuses has been done via the collective operator status, which contains a list of all affected MCPs and their matching RTE daemonsets list.

We keep populating the statuses in these fields for API backward
compatibility and additionally we start reflecting the status per
node group. The relation between the current node group mcp and daemonsets and the new representation is 1:1, and there is no change in the functionality, we are merely providing a new way to gather node group updates under a single entity.

However, this is a required preamble for supporting NodeSelector under NodeGroup, and NodeGroupStatus will be the only place to record and track node group state in HCP (hypershift).

Signed-off-by: Shereen Haj <[email protected]>
  • Loading branch information
shajmakh committed Sep 25, 2024
1 parent 9803afd commit 748fa02
Show file tree
Hide file tree
Showing 17 changed files with 638 additions and 6 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -310,5 +310,4 @@ func TestNodeGroupNormalizeName(t *testing.T) {
if got != expected {
t.Errorf("unexpected normalized name:\ngot=%+v\nexpected=%+v", got, expected)
}

}
27 changes: 27 additions & 0 deletions api/numaresourcesoperator/v1/numaresourcesoperator_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,29 @@ type NodeGroup struct {
Config *NodeGroupConfig `json:"config,omitempty"`
}

// NodeGroupStatus reports the status of a NodeGroup once matches an actual set of nodes and it is correctly processed
// by the system. In other words, is not possible to have a NodeGroupStatus which does not represent a valid NodeGroup
// which in turn correctly references unambiguously a set of nodes in the cluster.
// Hence, if a NodeGroupStatus is published, its `Name` must be present, because it refers back to a NodeGroup whose
// config was correctly processed in the Spec. And its DaemonSet will be nonempty, because matches correctly a set
// of nodes in the cluster. The Config is best-effort always represented, possibly reflecting the system defaults.
// If the system cannot process a NodeGroup correctly from the Spec, it will report Degraded state in the top-level
// condition, and will provide details using the aforementioned conditions.
type NodeGroupStatus struct {
// Name matches the name of a configured NodeGroup
Name string `json:"name"`
// DaemonSet of the configured RTEs, for this node group
//+operator-sdk:csv:customresourcedefinitions:type=status,displayName="RTE DaemonSets"
DaemonSets []NamespacedName `json:"daemonsets,omitempty"`
// NodeGroupConfig represents the latest available configuration applied to this NodeGroup
// +optional
//+operator-sdk:csv:customresourcedefinitions:type=status,displayName="Optional configuration enforced on this NodeGroup"
Config *NodeGroupConfig `json:"config,omitempty"`
// Selector represents label selector for this node group that was set by either MachineConfigPoolSelector or NodeSelector
//+operator-sdk:csv:customresourcedefinitions:type=status,displayName="Label selector of node group status"
Selector *metav1.LabelSelector `json:"selector,omitempty"`
}

// NUMAResourcesOperatorStatus defines the observed state of NUMAResourcesOperator
type NUMAResourcesOperatorStatus struct {
// DaemonSets of the configured RTEs, one per node group
Expand All @@ -134,6 +157,10 @@ type NUMAResourcesOperatorStatus struct {
// MachineConfigPools resolved from configured node groups
//+operator-sdk:csv:customresourcedefinitions:type=status,displayName="RTE MCPs from node groups"
MachineConfigPools []MachineConfigPool `json:"machineconfigpools,omitempty"`
// NodeGroups report the observed status of the configured NodeGroups, matching by their name
// +optional
//+operator-sdk:csv:customresourcedefinitions:type=status,displayName="Node groups observed status"
NodeGroups []NodeGroupStatus `json:"nodeGroups,omitempty"`
// Conditions show the current state of the NUMAResourcesOperator Operator
//+operator-sdk:csv:customresourcedefinitions:type=status,displayName="Condition reported"
Conditions []metav1.Condition `json:"conditions,omitempty"`
Expand Down
42 changes: 42 additions & 0 deletions api/numaresourcesoperator/v1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

109 changes: 109 additions & 0 deletions bundle/manifests/nodetopology.openshift.io_numaresourcesoperators.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -410,6 +410,115 @@ spec:
- name
type: object
type: array
nodeGroups:
description: NodeGroups report the observed status of the configured
NodeGroups, matching by their name
items:
description: |-
NodeGroupStatus reports the status of a NodeGroup once matches an actual set of nodes and it is correctly processed
by the system. In other words, is not possible to have a NodeGroupStatus which does not represent a valid NodeGroup
which in turn correctly references unambiguously a set of nodes in the cluster.
Hence, if a NodeGroupStatus is published, its `Name` must be present, because it refers back to a NodeGroup whose
config was correctly processed in the Spec. And its DaemonSet will be nonempty, because matches correctly a set
of nodes in the cluster. The Config is best-effort always represented, possibly reflecting the system defaults.
If the system cannot process a NodeGroup correctly from the Spec, it will report Degraded state in the top-level
condition, and will provide details using the aforementioned conditions.
properties:
config:
description: NodeGroupConfig represents the latest available
configuration applied to this NodeGroup
properties:
infoRefreshMode:
description: InfoRefreshMode sets the mechanism which will
be used to refresh the topology info.
enum:
- Periodic
- Events
- PeriodicAndEvents
type: string
infoRefreshPause:
description: InfoRefreshPause defines if updates to NRTs
are paused for the machines belonging to this group
enum:
- Disabled
- Enabled
type: string
infoRefreshPeriod:
description: InfoRefreshPeriod sets the topology info refresh
period. Use explicit 0 to disable.
type: string
podsFingerprinting:
description: PodsFingerprinting defines if pod fingerprint
should be reported for the machines belonging to this
group
enum:
- Disabled
- Enabled
- EnabledExclusiveResources
type: string
tolerations:
description: |-
Tolerations overrides tolerations to be set into RTE daemonsets for this NodeGroup. If not empty, the tolerations will be the one set here.
Leave empty to make the system use the default tolerations.
items:
description: |-
The pod this Toleration is attached to tolerates any taint that matches
the triple <key,value,effect> using the matching operator <operator>.
properties:
effect:
description: |-
Effect indicates the taint effect to match. Empty means match all taint effects.
When specified, allowed values are NoSchedule, PreferNoSchedule and NoExecute.
type: string
key:
description: |-
Key is the taint key that the toleration applies to. Empty means match all taint keys.
If the key is empty, operator must be Exists; this combination means to match all values and all keys.
type: string
operator:
description: |-
Operator represents a key's relationship to the value.
Valid operators are Exists and Equal. Defaults to Equal.
Exists is equivalent to wildcard for value, so that a pod can
tolerate all taints of a particular category.
type: string
tolerationSeconds:
description: |-
TolerationSeconds represents the period of time the toleration (which must be
of effect NoExecute, otherwise this field is ignored) tolerates the taint. By default,
it is not set, which means tolerate the taint forever (do not evict). Zero and
negative values will be treated as 0 (evict immediately) by the system.
format: int64
type: integer
value:
description: |-
Value is the taint value the toleration matches to.
If the operator is Exists, the value should be empty, otherwise just a regular string.
type: string
type: object
type: array
type: object
daemonsets:
description: DaemonSet of the configured RTEs, for this node
group
items:
description: |-
NamespacedName comprises a resource name, with a mandatory namespace,
rendered as "<namespace>/<name>".
properties:
name:
type: string
namespace:
type: string
type: object
type: array
name:
description: Name matches the name of a configured NodeGroup
type: string
required:
- name
type: object
type: array
relatedObjects:
description: RelatedObjects list of objects of interest for this operator
items:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ metadata:
}
]
capabilities: Basic Install
createdAt: "2024-09-24T07:18:49Z"
createdAt: "2024-09-24T09:03:48Z"
olm.skipRange: '>=4.17.0 <4.18.0'
operators.operatorframework.io/builder: operator-sdk-v1.36.1
operators.operatorframework.io/project_layout: go.kubebuilder.io/v3
Expand Down Expand Up @@ -149,6 +149,17 @@ spec:
applied to this MachineConfigPool
displayName: Optional configuration enforced on this NodeGroup
path: machineconfigpools[0].config
- description: NodeGroups report the observed status of the configured NodeGroups,
matching by their name
displayName: Node groups observed status
path: nodeGroups
- description: NodeGroupConfig represents the latest available configuration
applied to this NodeGroup
displayName: Optional configuration enforced on this NodeGroup
path: nodeGroups[0].config
- description: DaemonSet of the configured RTEs, for this node group
displayName: RTE DaemonSets
path: nodeGroups[0].daemonsets
- description: RelatedObjects list of objects of interest for this operator
displayName: Related Objects
path: relatedObjects
Expand Down
Loading

0 comments on commit 748fa02

Please sign in to comment.