Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perform a scheduling simulation in dry-run mode. #1585

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

zhifei92
Copy link
Contributor

A bug in dry-run mode.

Scenario: If the descheduler determines that 10 pods in the cluster can be evicted, and each of these pods requires 8 cores and 16 GB of memory (8c/16g), while the total available resources in the cluster are sufficient, but only one node meets the 8c/16g requirement. Under the current logic, all pods are considered evictable.

In dry-run mode, no resources are deducted because the pods are not actually evicted and rescheduled, nor is scheduling simulated.

loged:

klog.V(1).InfoS("Evicted pod in dry run mode", "pod", klog.KObj(pod), "reason", opts.Reason, "strategy", opts.StrategyName, "node", pod.Spec.NodeName, "profile", opts.ProfileName)

Using dry-run to preview which pods can be rescheduled may mislead users and cause unexpected issues.

Fix: #1584

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign knelasevero for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 25, 2024
@@ -173,6 +177,14 @@ func podFitsNodes(nodeIndexer podutil.GetPodsAssignedToNodeFunc, pod *v1.Pod, no
return filteredLen > 0
}

func addFitNodeNameToPod(pod *v1.Pod, node *v1.Node) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a fit node to the pod's annotation is not the most appropriate best practice. It seems more reasonable to add a return value to type FilterFunc func(*v1.Pod) bool, changing it to type FilterFunc func(*v1.Pod) (string, bool). However, many filters do not require this string return value. Is there a better implementation approach?

@@ -561,6 +563,24 @@ func (pe *PodEvictor) EvictPod(ctx context.Context, pod *v1.Pod, opts EvictOptio
return nil
}

func (pe *PodEvictor) simulateSchedulingInDryRun(pod *v1.Pod) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand why we need to create a pod in dry run mode. The purpose of dry run mode is to simulate the behavior of Descheduler to preview which pods may be evicted. 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comments:#1584 (comment)

I don't quite understand why we need to create a pod in dry run mode.

The purpose is to reserve the resources.

If any part is not described clearly, please feel free to comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got this. thanks

@@ -161,6 +164,7 @@ func podFitsNodes(nodeIndexer podutil.GetPodsAssignedToNodeFunc, pod *v1.Pod, no
if err == nil {
klog.V(4).InfoS("Pod fits on node", "pod", klog.KObj(pod), "node", klog.KObj(node))
atomic.AddInt32(&filteredLen, 1)
addFitNodeNameToPod(pod, node)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we going to label all pods, whether they are dry-run or non-dry-run, with this label?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be ideal if this behavior only occurred during dry-run and if it could pass the nodename without setting it as an annotation. However, I haven't figured out how to implement this yet. If this approach is accepted, we can give it a try.

@ingvagabund
Copy link
Contributor

ingvagabund commented Jan 1, 2025

@zhifei92 the dry descheduling mode was never meant to replace the full de/scheduling functionality. Rather to provide an estimation of what might get evicted. Instead, I suggest to combine both the descheduling and scheduling frameworks and run them both in the dry mode. https://github.com/kubernetes-sigs/cluster-capacity provides means to run the scheduling framework in the dry-mode. Yet, they way the dry mode in both frameworks is used needs to be extended to get closer to real descheduling/scheduling. Putting both frameworks together has been my goal for many years. Yet, due to lack of time I have not pursued this case much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Perform a scheduling simulation in dry-run mode.
4 participants