-
Notifications
You must be signed in to change notification settings - Fork 685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perform a scheduling simulation in dry-run mode. #1585
base: master
Are you sure you want to change the base?
Perform a scheduling simulation in dry-run mode. #1585
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@@ -173,6 +177,14 @@ func podFitsNodes(nodeIndexer podutil.GetPodsAssignedToNodeFunc, pod *v1.Pod, no | |||
return filteredLen > 0 | |||
} | |||
|
|||
func addFitNodeNameToPod(pod *v1.Pod, node *v1.Node) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a fit node to the pod's annotation is not the most appropriate best practice. It seems more reasonable to add a return value to type FilterFunc func(*v1.Pod) bool
, changing it to type FilterFunc func(*v1.Pod) (string, bool)
. However, many filters do not require this string return value. Is there a better implementation approach?
@@ -561,6 +563,24 @@ func (pe *PodEvictor) EvictPod(ctx context.Context, pod *v1.Pod, opts EvictOptio | |||
return nil | |||
} | |||
|
|||
func (pe *PodEvictor) simulateSchedulingInDryRun(pod *v1.Pod) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand why we need to create a pod in dry run mode. The purpose of dry run mode is to simulate the behavior of Descheduler to preview which pods may be evicted. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comments:#1584 (comment)
I don't quite understand why we need to create a pod in dry run mode.
The purpose is to reserve the resources.
If any part is not described clearly, please feel free to comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got this. thanks
@@ -161,6 +164,7 @@ func podFitsNodes(nodeIndexer podutil.GetPodsAssignedToNodeFunc, pod *v1.Pod, no | |||
if err == nil { | |||
klog.V(4).InfoS("Pod fits on node", "pod", klog.KObj(pod), "node", klog.KObj(node)) | |||
atomic.AddInt32(&filteredLen, 1) | |||
addFitNodeNameToPod(pod, node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we going to label all pods, whether they are dry-run or non-dry-run, with this label?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be ideal if this behavior only occurred during dry-run and if it could pass the nodename without setting it as an annotation. However, I haven't figured out how to implement this yet. If this approach is accepted, we can give it a try.
@zhifei92 the dry descheduling mode was never meant to replace the full de/scheduling functionality. Rather to provide an estimation of what might get evicted. Instead, I suggest to combine both the descheduling and scheduling frameworks and run them both in the dry mode. https://github.com/kubernetes-sigs/cluster-capacity provides means to run the scheduling framework in the dry-mode. Yet, they way the dry mode in both frameworks is used needs to be extended to get closer to real descheduling/scheduling. Putting both frameworks together has been my goal for many years. Yet, due to lack of time I have not pursued this case much. |
A bug in dry-run mode.
Scenario: If the descheduler determines that 10 pods in the cluster can be evicted, and each of these pods requires 8 cores and 16 GB of memory (8c/16g), while the total available resources in the cluster are sufficient, but only one node meets the 8c/16g requirement. Under the current logic, all pods are considered evictable.
In dry-run mode, no resources are deducted because the pods are not actually evicted and rescheduled, nor is scheduling simulated.
loged:
Using dry-run to preview which pods can be rescheduled may mislead users and cause unexpected issues.
Fix: #1584