-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update KubevirtVmHighMemoryUsage.md #236
Conversation
Signed-off-by: Shirly Radco <[email protected]>
@machadovilaca please review |
/hold Didn't we say we have to remove this alert? |
No we never did, we planned to remove the component-related alerts, not the workload-related ones. |
Ah! Thanks, yes. So, for this workload related alert - I think we have to remove it as well. IOW it fires, even if we are not really exceeding the limits. Do I recall this correctly? |
@fabiand I suggested to update the alert expression to use the pod metrics instead of the kubevirt VMI metrics. |
I'm not seeing how a switch to pod metrics helps to address the problem. |
@fabiand since Kubernetes uses |
TBH I was confused by this as well :) I wonder how these alerts could benefit the user. Now, as a user, let's say I create a VM with dedicated CPUs and 2Gi of guest memory. Let's say that after a while I see a The docs say the mitigation is So I tend to vote +1 for removing it, unless I'm missing something. (*) An advance user CAN set the limits, or use advanced features like overcommitmentRatio, but these are special cases that are far from the "average VM". |
@fabiand @iholder101 I'm a user and I asked for a VM with 4GB and I'm getting to the memory limit that puts my VM at risk of eviction, I need to know about it and resize my VM memory. No? Why should it matters who set the limit? |
As a user you've requested a VM with 4GB. Now you log into your guest and can use up to these 4GB. Let's say that you even use all/most of the 4GB of memory. Now you get an alert telling you that you are about to reach the memory limit. But you're not aware of any limits. You've requested 4GB and you're using 4GB. You, as a user, did not cross any boundary, you're just using what you've asked for. How does such a user is able to mitigate it or even understand what's going on? |
@iholder101 When we check the based on the pod metrics, not the kubevirt metrics
If both answers to the above questions are "no", then I think we should deprecate the alert. |
These are important questions, let's discuss them.
Basically, the admin has the following options:
note: keep in mind that a configuration such as
Yes, it definitely is.
Please note that in the "what the admin can do" section above, the two options consist a step of reporting it as a bug. This is perhaps the main point I want to emphasize here: the fact that a virt-launcher's pod memory reaches its limits is a bug that Kubevirt is responsible for, not the user/admin. Saying that, it's important to understand that Kubevirt currently suffers from a non-accurate calculation with respect to virt-launcher's memory requests/limits. So I believe the discussion we're having here can be generalized: how should Kubevirt cope with scenarios in which we have challenges in certain aspects, or IOW, in cases where Kubevirt doesn't perform a specific task well enough? The current flow looks something like the following:
The problem with this flow is that Kubevirt does not take responsibility for its weaker parts. We practically tell our users: "we know there are problems. if you face a problem we've added an alert to act as debug information for us, you can contact us and we'll consider to fix your problems in the next release or help you with a workaround". A better approach would be that us developers would receive this debug information and fix the problem for the user proactively. I think that a better flow would look something like:
That is, Kubevirt should be the one who's proactively chasing for bugs and fixing them while taking responsibility for its weaker parts. As for the alert itself: I'm not entirely sure if we should keep this alert at all if we go for the suggested approach above, but the least I would expect from such an alert is:
|
I agree with you that we should not have an alert for the virt-launcher* overhead issue and it should be replaced with a metric we can perhaps collect to Telemetry, like we did for virt-handler, virt-controller, virt-api and virt-operator. What I meant is that I think we should alert on the VM guest when its available memory from the guest POV is low, so that the VM owner consider asking for more memory to the VMI.
|
Closing this PR, Since we will deprecate this alert and another alert instead. |
Sure! This is different and sounds like something reasonable to consider!
I guess the implication of the guest memory being all used is that the guest OS might start OOMing applications. I think this alert can be useful for certain cases.
Hard to say. Can this be configurable? |
What this PR does / why we need it:
The PR fixes KubevirtVmHighMemoryUsage runbook.
It updates that the alerts fires when a container hosting a virtual machine (VM) has less than 50 MB free memory, instead of 20 MB.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #
https://issues.redhat.com/browse/OCPBUGS-24377
Special notes for your reviewer:
Checklist
This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR.
Approvers are expected to review this list.
Release note: