Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use actual node resource utilization by consuming kubernetes metrics #1555

Conversation

ingvagabund
Copy link
Contributor

@ingvagabund ingvagabund commented Nov 15, 2024

Extend LowNodeUtilization with getting nodes and pod usage from kubernetes metrics.

Policy configuration:

apiVersion: "descheduler/v1alpha2"
kind: "DeschedulerPolicy"
metricsCollector:
  enabled: true
profiles:
- name: ProfileName
  pluginConfig:
  - name: "LowNodeUtilization"
    args:
      thresholds:
        "cpu" : 20
        "memory": 20
        "pods": 20
      targetThresholds:
        "cpu" : 50
        "memory": 50
        "pods": 50
      metricsUtilization:
        metricsServer: true
  plugins:
    balance:
      enabled:
        - "LowNodeUtilization"

Fixes: #225

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Nov 15, 2024
@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Nov 15, 2024
@ingvagabund ingvagabund force-pushed the actual-utilization-kubernetes-metrics branch 3 times, most recently from 0eac57f to b90abad Compare November 15, 2024 21:42
@ingvagabund ingvagabund changed the title WIP: Actual utilization kubernetes metrics Actual utilization kubernetes metrics Nov 15, 2024
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 15, 2024
@ingvagabund ingvagabund force-pushed the actual-utilization-kubernetes-metrics branch 7 times, most recently from ad0e809 to 5495682 Compare November 15, 2024 22:10
@ingvagabund ingvagabund changed the title Actual utilization kubernetes metrics Use actual node resource utilization by consuming kubernetes metrics Nov 15, 2024
@ingvagabund ingvagabund force-pushed the actual-utilization-kubernetes-metrics branch from 5495682 to 78ae117 Compare November 16, 2024 10:09
func (mc *MetricsCollector) Run(ctx context.Context) {
wait.NonSlidingUntil(func() {
mc.Collect(ctx)
}, 5*time.Second, ctx.Done())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the timing be configurable?

Copy link
Contributor Author

@ingvagabund ingvagabund Nov 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not want to spawn the initial implementation with additional configuration. The API can be extended at any point if 5 seconds interval is not sufficient. We could wait for feedback? Additionally, it might help to wait for e.g. 50 seconds to collect more samples before the descheduling cycle starts to soften the captured cpu/memory utilization in case some nodes are wildly changing their utilization. The next iteration could extend the API with:

metricsCollector:
  collectionInterval: 10s # collect a new sample every 10s
  initialDelay: 50s # wait for 50s before evicting to soften the captured cpu/memory utilization
  initialSamples: 5 # collect at least 5 samples before evicting to soften the captured cpu/memory utilization

}
}

func (mc *MetricsCollector) Run(ctx context.Context) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wonder if we should kick the go routine off in here so calling it is simpler.
we could block here until we get metrics for the initial load hasSynced if we wanted.
Could be nice to fail on boot if their metrics server isn't available?

Copy link
Contributor Author

@ingvagabund ingvagabund Nov 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of making the developer facing API simpler. In general, I tend to invoke a go routine explicitly so I am aware there's a go routine. Rather then hiding it. Which I would like to avoid in this case.

Could be nice to fail on boot if their metrics server isn't available?

We could replace PollUntilWithContext with wait.PollWithContext and set the timeout to e.g. 1 minute? To give the metric server a fighting chance to come up. This could be another configurable option in the next iteration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replacing PollUntilWithContext with PollWithContext.

@jklaw90
Copy link
Contributor

jklaw90 commented Nov 17, 2024

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 17, 2024
@ingvagabund ingvagabund force-pushed the actual-utilization-kubernetes-metrics branch from 78ae117 to 8c585fc Compare November 17, 2024 10:58
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 17, 2024
@ingvagabund ingvagabund force-pushed the actual-utilization-kubernetes-metrics branch 4 times, most recently from a609360 to 39bc3f0 Compare November 17, 2024 11:28
@jklaw90
Copy link
Contributor

jklaw90 commented Nov 18, 2024

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 18, 2024
pkg/descheduler/metricscollector/metricscollector.go Outdated Show resolved Hide resolved
pkg/descheduler/metricscollector/metricscollector.go Outdated Show resolved Hide resolved
hasSynced bool
}

func NewMetricsCollector(clientset kubernetes.Interface, metricsClientset metricsclient.Interface, nodeSelector string) *MetricsCollector {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could for example pass the resourceNames here and return an error if we do not support any of them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metrics collector is expected to always return cpu and memory. Do you foresee this might not be the case?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I just meant to have the correct intersection between the resourceNames passed to the actualUsageClient and the real capability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both metrics collector and usage client operate independently. For that, the check needs to live outside of both.

Comment on lines 67 to 78
func weightedAverage(prevValue, value int64) int64 {
return int64(math.Floor(beta*float64(prevValue) + (1-beta)*float64(value)))
}
Copy link

@atiratree atiratree Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is the convergence to the current value supposed to work? What is the time window and the speed (function) of the convergence?

This depends on the 5s mentioned above. Can we add some documentation above this function?

Here is an example how the convergence will work:

node-usage-scaling

which is not ideal and we would also never converge to the the current usage (because of the floor)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usage is always changing so converging to the actual usage is not necessary as long as the difference is minimal. 9 in this case. Which is negligible for memory usage. For cpu giving 9 millicores difference.

how is the convergence to the current value supposed to work? What is the time window and the speed (function) of the convergence?

This is an experimental implementation. Based on https://medium.com/@tobias-chc/exponentially-weighted-average-5eed00181a09. Users are not expected to know the implementation details. If the current "softening" does not work well we can always re-implement.

Copy link
Contributor Author

@ingvagabund ingvagabund Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With math.Round the value stops at 496.

Copy link
Contributor Author

@ingvagabund ingvagabund Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even setting the scale from Milli to Micro/Nano and going from 1400 to 900 the value stops at 901.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping the scale to Micro to reduce the difference

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with the Exponentially Weighted Average convergence. But maybe we could detect that we have stopped converging and just use the new value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check would need to compute the next step. Which beats the intention. Unless I misunderstood.

Copy link

@atiratree atiratree Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could check if the computed value by weightedAverage equals to prevValue. If yes then we know the difference is small and can use the value instead.

This way we would the same deterministic behaviour when the descheduler has just been restarted or has been running for some time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment to weightedAverage function.

@k8s-ci-robot
Copy link
Contributor

New changes are detected. LGTM label has been removed.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 18, 2024
@ingvagabund ingvagabund force-pushed the actual-utilization-kubernetes-metrics branch from 1600bd7 to 00a6962 Compare November 18, 2024 21:26
@ingvagabund ingvagabund force-pushed the actual-utilization-kubernetes-metrics branch from 00a6962 to 3194bfc Compare November 19, 2024 15:12
pkg/descheduler/descheduler.go Outdated Show resolved Hide resolved
pkg/descheduler/metricscollector/metricscollector.go Outdated Show resolved Hide resolved
Comment on lines 67 to 78
func weightedAverage(prevValue, value int64) int64 {
return int64(math.Floor(beta*float64(prevValue) + (1-beta)*float64(value)))
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with the Exponentially Weighted Average convergence. But maybe we could detect that we have stopped converging and just use the new value?

pkg/framework/plugins/nodeutilization/usageclients.go Outdated Show resolved Hide resolved
pkg/descheduler/metricscollector/metricscollector.go Outdated Show resolved Hide resolved
pkg/descheduler/metricscollector/metricscollector.go Outdated Show resolved Hide resolved
hasSynced bool
}

func NewMetricsCollector(clientset kubernetes.Interface, metricsClientset metricsclient.Interface, nodeSelector string) *MetricsCollector {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I just meant to have the correct intersection between the resourceNames passed to the actualUsageClient and the real capability.

@ingvagabund
Copy link
Contributor Author

E2E tests are timeout out. Extending the timeout from 30m to 40m through kubernetes/test-infra#33818.

@ingvagabund
Copy link
Contributor Author

/retest-required


for _, resourceName := range client.resourceNames {
if _, exists := nodeUsage[resourceName]; !exists {
return fmt.Errorf("unable to find %q resource for collected %q node metric", resourceName, node.Name)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, so now we do not need to do check the resourceNames in the client.metricsCollector. It can return anything it wants and we just return an error here if it does not comply.

}, 5*time.Second, ctx.Done())
}

func weightedAverage(prevValue, value int64) int64 {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to have some documentation above this function once we resolve #1555 (comment) thread

@ingvagabund ingvagabund force-pushed the actual-utilization-kubernetes-metrics branch 2 times, most recently from 6305d68 to 86d76cd Compare November 20, 2024 13:22
@atiratree
Copy link

Thanks!
/lgtm

@k8s-ci-robot
Copy link
Contributor

@atiratree: changing LGTM is restricted to collaborators

In response to this:

Thanks!
/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ingvagabund
Copy link
Contributor Author

Squashing commits.

@ingvagabund ingvagabund force-pushed the actual-utilization-kubernetes-metrics branch from 86d76cd to 6567f01 Compare November 20, 2024 13:31
@ingvagabund ingvagabund added lgtm "Looks good to me", indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Nov 20, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

Approval requirements bypassed by manually added approval.

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit a962cca into kubernetes-sigs:master Nov 20, 2024
9 checks passed
@ingvagabund ingvagabund deleted the actual-utilization-kubernetes-metrics branch November 20, 2024 14:15
@langchain-infra
Copy link

any idea on when this might be released @ingvagabund ? I see it's not in any of the release notes yet

@ingvagabund
Copy link
Contributor Author

Currently targeted for v0.32.0

@a7i
Copy link
Contributor

a7i commented Dec 14, 2024

@ingvagabund
Copy link
Contributor Author

ingvagabund commented Dec 17, 2024

Process did not finish before 40m0s timeout

Looks like this is not a cause but a consequence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use actual node resource utilization in the strategy "LowNodeUtilization"
6 participants