-
Notifications
You must be signed in to change notification settings - Fork 403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamic Notification Policy Routes #1800
base: master
Are you sure you want to change the base?
Dynamic Notification Policy Routes #1800
Conversation
Adapted PROJECT to new go module version path and ran: ``` ./bin/operator-sdk create api --group grafana --version v1beta1 --kind GrafanaNotificationPolicyRoute --controller false ```
- during the reconcile loop in notificationpolicy_controller.go, we have to fetch all matching GrafanaNotificationPolicyRoutes for the currently reconciled GrafanaNotificationPolicy - this can be very easily achieved with a routeSelector, which will be a Kubernetes LabelSelector - if we would go with instanceSelector, we would have to fetch all available GrafanaNotificationPolicyRoutes and then do some filtering afterwards, to see if the instanceSelector matches, which would be both more inefficient and more complex
The GrafanaNotificationPolicy Controller now watches GrafanaNoticationPolicyRoutes instead of using ownerReferences, as ownerReferences do not support cross-namespace references. We now also emit a event on the GrafanaNotificationPolicyRoute to indicate that it has been merged into a specific policy.
This PR hasn't been updated for a while, marking as stale |
I have updated the draft PR with the latest discussions from #1789:
There are no checks for Another idea would be to solve this by implementing a ValidationWebhook. @theSuess do you have any thoughts on this? Results of updates SamplesAssembled ❮ kubectl get grafananotificationpolicy grafananotificationpolicy-sample -o jsonpath="{.status}" |jq {
"conditions": [
{
"lastTransitionTime": "2025-01-21T09:48:19Z",
"message": "Notification Policy was successfully applied to 1 instances",
"observedGeneration": 1,
"reason": "ApplySuccessful",
"status": "True",
"type": "NotificationPolicySynchronized"
}
],
"discoveredRoutes": [
"default/dynamic-e",
"grafana-crds/dynamic-c",
"default/dynamic-d"
]
} ❯ kubectl get events
LAST SEEN TYPE REASON OBJECT MESSAGE
20m Normal Merged grafananotificationpolicyroute/dynamic-d Route merged into NotificationPolicy default/grafananotificationpolicy-sample
20m Normal Merged grafananotificationpolicyroute/dynamic-d Route merged into NotificationPolicy default/grafananotificationpolicy-sample
20m Normal Merged grafananotificationpolicyroute/dynamic-e Route merged into NotificationPolicy default/grafananotificationpolicy-sample
20m Normal Merged grafananotificationpolicyroute/dynamic-e Route merged into NotificationPolicy default/grafananotificationpolicy-sample |
3da8a88
to
95a1fd8
Compare
I rebased on the latest master branch. Additionally I looked into the ValidationWebhook for ensuring that While the implementation of the logic is quite straight-forward, setting up webhooks is a bit more complex, especially bringing this to the Helm chart. I can certainly do this if you think this is the way forward, we can find another way to do the validation, or maybe skip this validation for now. Any thoughts? |
Sorry for the deleted comment, missed the part about api level validation rules in your previous note. We had some discussions around Validation Webhooks before and came to the conclusion that these are too finicky to get right. If the validation rules don't work, I'd just go for a status condition in the respective object when both values are set which tells the user that the dynamic matching takes priority and the |
Unfortunately, as far as I can see, the validation expressions will not work due to the Alternatively, I will go with the hint in the status condition as you suggested 👍 |
The reason why I added the schemaless validation is to support recursive objects. If there is a cleaner way, I'm all for it! |
- adds validation for ensuring routes and routeSelector are mutual exclusive - updates both GrafanaNotificationPolicy and GrafanaNotificationPolicyRoute status conditions accordingly
I updated the PR and implemented the validation by updating the status condition on both
|
} | ||
assembledNotificationPolicy, mergedRoutes, err = assembleNotificationPolicyRoutes(ctx, r.Client, namespace, assembledNotificationPolicy) | ||
r.Log.Info("assembled notification policy routes", "mergedRoutes", mergedRoutes) | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How should we handle errors here e.g. due to a loop in matched routes? Is it fine as is, by simply returning with an error? Alternatively we could also update the status condition similar to applyErrors := make(map[string]string)
below and stop reconciling.
Looking at the code below, it looks like the status condition is never updated when errors occurred, not sure if I am misreading the code or this is on purpose:
if len(applyErrors) > 0 {
return ctrl.Result{}, fmt.Errorf("failed to apply to all instances: %v", applyErrors)
}
// ...
condition := buildSynchronizedCondition("Notification Policy", conditionNotificationPolicySynchronized, notificationPolicy.Generation, applyErrors, len(instances))
meta.SetStatusCondition(¬ificationPolicy.Status.Conditions, condition)
Looks like this has been adapted here: #1815
Any thoughts @theSuess ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning an error doesn't make sense here, as it would cause the operator to retry the same resource. Assuming nothing changed, it'll just fail again. My preferred outcome is to note the error in the conditions.
Status conditions are applied through the defer
function here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm missing something, but the errors are never set on the condition, because the controller always returns when there are errors and never reaches the code to set the error condition:
see:
if len(applyErrors) > 0 { |
if len(applyErrors) > 0 {
return ctrl.Result{}, fmt.Errorf("failed to apply to all instances: %v", applyErrors)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made some modifications here, hopefully that resolves the issue: f924763
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very good! I've tested it locally and it works as expected. I left some minor comments, but nothing that should require large scale refactoring.
Thanks for also taking care of the documentation right away 💯
if notificationPolicy.Spec.Route.IsRouteSelectorMutuallyExclusive() { | ||
meta.RemoveStatusCondition(¬ificationPolicy.Status.Conditions, conditionRoutesIgnoredDueToRouteSelector) | ||
} else { | ||
setRoutesIgnoredDueToRouteSelectorCondition(¬ificationPolicy.Status.Conditions, notificationPolicy.Generation) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there anything preventing this check from occurring before applying the notification policy? I'd like to avoid half-applied notification policies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We said that in case both are specified, we simply overwrite what is specified in Routes
, so I think it's not an issue if we update the condition at the end. There should be no half-applied notification policies as far as I can see.
Co-authored-by: Dominik Süß <[email protected]>
Co-authored-by: Dominik Süß <[email protected]>
related GrafanaNotificationPolicies can now no longer easily retrieved by comparing labels, as routes can be referenced transitively. Therefore we simply sync all related policies now.
733f358
to
942a0b5
Compare
Co-authored-by: Dominik Süß <[email protected]>
See #1789 for the related feature proposal.
This implementation is still work in progress for now and should support the feature proposal.
Updates on
GrafanaNotificationPolicyRoutes
are tracked via watches. I first tried to accomplish this via ownerReferences (see commits), but this is not supported cross-namespace.Example of how the sample policies in
hack/kind/resources/default/grafana-notification-policy.yaml
are rendered:Status updates on the
GrafanaNotificationPolicy
Events emitted for the merged routes: