Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[lownodeutilization]: Actual utilization: integration with Prometheus #1533

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

ingvagabund
Copy link
Contributor

@ingvagabund ingvagabund commented Oct 11, 2024

Extend the actual utilization awareness with Prometheus integration.

For testing purposes:

    apiVersion: "descheduler/v1alpha2"
    kind: "DeschedulerPolicy"
    prometheus:
      url: http://prometheus-kube-prometheus-prometheus.prom.svc.cluster.local
        authToken:
          secretReference:
            namespace: "kube-system"
            name: "authtoken"
    profiles:
      - name: ProfileName
        pluginConfig:
        - name: "LowNodeUtilization"
          args:
            thresholds:
              "MetricResource": 20
            targetThresholds:
              "MetricResource": 70
            metricsUtilization:
              prometheus:
                query: instance:node_cpu:rate:sum
        plugins:
          balance:
            enabled:
              - "LowNodeUtilization"

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from ingvagabund. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 11, 2024
@fanhaouu
Copy link
Contributor

Hello, master. Due to the company's busy schedule previously, I only managed to complete half of the related KEP. I'm glad to see that you're working on this. It looks like you're aiming to reuse the current Node utilization logic. I have a few suggestions:

It should support different data sources, similar to PayPal's load-watcher.
It should support various real-time data processing algorithms. For instance, real-time calculations, using rate averages, or predictions based on EWMA + P95, similar to the approach used by autoscaler.
If the goal is to address real-time CPU hotspots, perhaps there’s no need to calculate the number of nodes below or above a certain threshold. Of course, you could also provide a switch to control this behavior.

Hope these suggestions help!

@ingvagabund
Copy link
Contributor Author

Hello sir :)

thank you for taking part in composing the out-of-tree descheduling plugin KEP.

It should support different data sources, similar to PayPal's load-watcher.

You are on the right track here. I'd like to get in touch with load-watcher maintainers and extend the codebase to provide a generic interface for accessing metrics related to pod utilization as well. Currently, only actual node utilization gets collected. Meantime, I am forming the code here to be able to better integrate with other utilization sources like metrics.

It should support various real-time data processing algorithms. For instance, real-time calculations, using rate averages, or predictions based on EWMA + P95, similar to the approach used by autoscaler.

This is where we can debate more. Thank you for sharing the specifics. There's an open issue for the pod autoscaler suggesting to introduce EMA: kubernetes/kubernetes#62235. Are you aware if there's a similar issue or a discussion for the cluster autoscaler? I'd love to learn more about how it's implemented there. Ultimately, the current plugin just needs to know which pod, when evicted, will improve the overall node/workload utilization when properly re-scheduled. I could see various ways to produce the utilization snapshot using various methods.

If the goal is to address real-time CPU hotspots, perhaps there’s no need to calculate the number of nodes below or above a certain threshold. Of course, you could also provide a switch to control this behavior.

I can see how evicting hotspot pods is related to consuming the metrics/real-time node utilization. In the current plugin context this is more suitable for a new/different plugin. I can also see how RemoveDuplicates can be extended to evict based on overall node utilization instead of the current counting approach. Not every plugin will need to consume metrics. Though, there can be common pieces shared across them through the descheduling framework.

@ingvagabund ingvagabund force-pushed the node-utilization-util-snapshot branch from c889a53 to 1f55c4d Compare October 15, 2024 10:18
@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 5, 2024
@ingvagabund ingvagabund force-pushed the node-utilization-util-snapshot branch 3 times, most recently from d744a96 to 800c92c Compare November 6, 2024 18:34
@ingvagabund
Copy link
Contributor Author

kubernetes/kubernetes#128663 to address the discrepancy in the fake metrics client node/pod metricses resource name.

@ingvagabund ingvagabund force-pushed the node-utilization-util-snapshot branch from f30f8a1 to 2e63411 Compare November 7, 2024 15:40
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 7, 2024
@ingvagabund ingvagabund force-pushed the node-utilization-util-snapshot branch 4 times, most recently from 0330902 to baa6650 Compare November 8, 2024 15:52
@ingvagabund
Copy link
Contributor Author

/test pull-descheduler-verify-master

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 13, 2024
@ingvagabund
Copy link
Contributor Author

Integration with kubernetes metrics in #1555.

@ingvagabund ingvagabund force-pushed the node-utilization-util-snapshot branch from baa6650 to 2442967 Compare November 16, 2024 09:09
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 16, 2024
@ingvagabund ingvagabund force-pushed the node-utilization-util-snapshot branch 4 times, most recently from 668f0fb to 477104c Compare November 16, 2024 17:11
@ingvagabund ingvagabund force-pushed the node-utilization-util-snapshot branch 4 times, most recently from 425f449 to d16e2b0 Compare November 17, 2024 13:41
@ingvagabund
Copy link
Contributor Author

/retest-required

@ingvagabund
Copy link
Contributor Author

/retest-required

@ingvagabund ingvagabund force-pushed the node-utilization-util-snapshot branch 6 times, most recently from fbf6fb2 to 3ec47f7 Compare November 18, 2024 14:47
@ingvagabund
Copy link
Contributor Author

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 18, 2024
@ingvagabund
Copy link
Contributor Author

Waiting for #1555 to merge

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 19, 2024
@ingvagabund ingvagabund force-pushed the node-utilization-util-snapshot branch from 4a21180 to a4186f1 Compare November 19, 2024 16:00
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 19, 2024
@ingvagabund ingvagabund force-pushed the node-utilization-util-snapshot branch from d30044e to c478ce8 Compare November 19, 2024 17:18
@ingvagabund ingvagabund force-pushed the node-utilization-util-snapshot branch from c478ce8 to d143aad Compare November 20, 2024 14:19
@ingvagabund ingvagabund changed the title Node utilization util snapshot [lownodeutilization]: Actual utilization: integration with Prometheus Nov 20, 2024
@ingvagabund
Copy link
Contributor Author

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 20, 2024
Comment on lines 46 to +49
MetricsCollector MetricsCollector

// Prometheus enables metrics collection through Prometheus
Prometheus Prometheus

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't new fields be optional and allow nil value?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be good to also consider changing fields introduced in #1555

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is tricky. The versioned type adds the omitempty tag. Yet given Prometheus is a struct one needs to turn the field into a pointer to get it omitted in the serialized json. For the actual setting you can ignore the field. The field is not required.

@@ -57,4 +57,14 @@ type MetricsUtilization struct {
// metricsServer enables metrics from a kubernetes metrics server.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question about MetricsUtilization field: shouldn't it be nullable?

@@ -57,4 +57,14 @@ type MetricsUtilization struct {
// metricsServer enables metrics from a kubernetes metrics server.
// Please see https://kubernetes-sigs.github.io/metrics-server/ for more.
MetricsServer bool `json:"metricsServer,omitempty"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be better expressed by an enum

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. Enums can be easily extended with other values besides true/false.


type AuthToken struct {
// raw for a raw authentication token
Raw string

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we considered disadvantages of using Raw tokens? Wouldn't it be better to start with the secret ref? And revisit the raw token topic in the future?

type Prometheus struct {
URL string
AuthToken AuthToken
InsecureSkipVerify bool

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question about supporting the InsecureSkipVerify.

client._nodeUtilization = make(map[string]map[v1.ResourceName]*resource.Quantity)
client._pods = make(map[string][]*v1.Pod)

results, warnings, err := promv1.NewAPI(client.promClient).Query(context.TODO(), client.promQuery, time.Now())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be nicer to have the context from above

Comment on lines +264 to +266
func (client *prometheusUsageClient) resourceNames() []v1.ResourceName {
return []v1.ResourceName{ResourceMetrics}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this method?

@@ -34,6 +34,8 @@ import (
"sigs.k8s.io/descheduler/pkg/utils"
)

const ResourceMetrics = v1.ResourceName("MetricResource")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect the value to be the same as the variable name. Any reason why is it different?

Comment on lines +97 to +99
if len(resourceNames) == 1 && resourceNames[0] == ResourceMetrics {
// Make ResourceMetrics 100% => 100 points
nodeCapacity[ResourceMetrics] = *resource.NewQuantity(int64(100), resource.DecimalSI)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: might be safer to double fail here. It expects that the plugin above does the validation correctly.

@@ -95,6 +106,11 @@ func NewLowNodeUtilization(args runtime.Object, handle frameworktypes.Handle) (f
return nil, fmt.Errorf("metrics client not initialized")
}
usageClient = newActualUsageClient(resourceNames, handle.GetPodsAssignedToNodeFunc(), handle.MetricsCollector())
} else if lowNodeUtilizationArgsArgs.MetricsUtilization.Prometheus.Query != "" {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mentioned, enum would look better here

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 11, 2024
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants