Skip to content

Commit

Permalink
sched: controller: set scheduler priority
Browse files Browse the repository at this point in the history
So far the scheduler priority is set to default which is 0 this is risky
especially when the preemtion of pods is needed to fit more important pods.

The NRS is important enough to deserve the most critical priority class
system-node-critical which is the same priority for the
kube-scheduler. We need this priority set always regardless how many
replicas are set for the scheduler, and especially if we look to
optimize the HA of the scheduler.

We choose system-node-critical over system-cluster-critical because we don't want to allow SS preemption by higher-priority pods. If it was set to system-cluster-critical and an event is triggered that requires pod eviction, which would be for scheduling system-node-critical workloads, the SS would be at risk of being evicted. although this would be very rare and the evicted pod will be rescheduled, there is no convincing reason not to make it node-critical.

addresses openshift-kni#974

Signed-off-by: Shereen Haj <[email protected]>
  • Loading branch information
shajmakh committed Aug 12, 2024
1 parent 6c6c29e commit 7844680
Show file tree
Hide file tree
Showing 2 changed files with 20 additions and 0 deletions.
4 changes: 4 additions & 0 deletions controllers/numaresourcesscheduler_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ import (

const (
leaderElectionResourceName = "numa-scheduler-leader"
schedulerPriorityClassName = "system-node-critical"
)

const (
Expand Down Expand Up @@ -217,6 +218,9 @@ func (r *NUMAResourcesSchedulerReconciler) syncNUMASchedulerResources(ctx contex
// TODO: if replicas doesn't make sense (autodetect disabled and user set impossible value) then we
// should set a degraded state

// node-critical so the pod won't be preempted by pods having the most critical priority class
r.SchedulerManifests.Deployment.Spec.Template.Spec.PriorityClassName = schedulerPriorityClassName

schedupdate.DeploymentImageSettings(r.SchedulerManifests.Deployment, schedSpec.SchedulerImage)
cmHash := hash.ConfigMapData(r.SchedulerManifests.ConfigMap)
schedupdate.DeploymentConfigMapSettings(r.SchedulerManifests.Deployment, r.SchedulerManifests.ConfigMap.Name, cmHash)
Expand Down
16 changes: 16 additions & 0 deletions controllers/numaresourcesscheduler_controller_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -208,6 +208,22 @@ var _ = ginkgo.Describe("Test NUMAResourcesScheduler Reconcile", func() {
gomega.Expect(nrs.Status.CacheResyncPeriod.Seconds()).To(gomega.Equal(resyncPeriod.Seconds()))
})

ginkgo.It("should have the correct priority class", func() {
key := client.ObjectKeyFromObject(nrs)
_, err := reconciler.Reconcile(context.TODO(), reconcile.Request{NamespacedName: key})
gomega.Expect(err).ToNot(gomega.HaveOccurred())

key = client.ObjectKey{
Name: "secondary-scheduler",
Namespace: testNamespace,
}

dp := &appsv1.Deployment{}
gomega.Expect(reconciler.Client.Get(context.TODO(), key, dp)).ToNot(gomega.HaveOccurred())

gomega.Expect(dp.Spec.Template.Spec.PriorityClassName).To(gomega.BeEquivalentTo(schedulerPriorityClassName))
})

ginkgo.It("should have a config hash annotation under deployment", func() {
key := client.ObjectKeyFromObject(nrs)
_, err := reconciler.Reconcile(context.TODO(), reconcile.Request{NamespacedName: key})
Expand Down

0 comments on commit 7844680

Please sign in to comment.