Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add proposal for NotificationPolicyRoute CRD #1789

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

msvechla
Copy link

@msvechla msvechla commented Dec 9, 2024

As discussed in #1767, this is the proposal for adding a new NotificationPolicyRoute CRD that allows assembling a Notification Policy from individual resources.

Please let me know what you think and thanks for your support!

As discussed in grafana#1767, this is the proposal for adding a new
NotificationPolicyRoute CRD that allows assembling a Notification Policy
from individual resources.

Please let me know what you think and thanks for your support!
@Baarsgaard
Copy link
Contributor

Baarsgaard commented Dec 12, 2024

I have a few questions/comments.

  1. All the reconciliation logic would have to be implemented in the GrafanaNotificationPolicy controller.
    Which is quite different from existing CRDs, but not a problem, there just would not be a reconciliation loop, outside maybe a status updates/finalizers.
    The benefit of a structured config and CR Create/Update validations are probably worth it.

  2. I do however not see a reason for the GrafanaNotificationPolicyRoute to contain the common options: instanceSelector or resyncPeriod.
    allowCrossNamespaceImport might still make sense? But I would probably lean towards all or nothing for implementation simplicity.
    allowCrossNamespaceImport could be inherited from GrafanaNotificationPolicy.
    Meaning, a GrafanaNotificationPolicy with allowCrossNamespaceImport: true also imports matching routes from other namespaces.

  3. Are you planning on re-using dynamic routes across multiple GrafanaNotificationPolicy CRs, and are there cases where this could not be achieved with the existing instanceSelector, hence the routeSelector?
    An example or two of this would be great.
    Alternatively, I would consider the possibility of re-using the instanceSelector criteria for also finding GrafanaNotificationPolicyRoute, this could be a mistake as then instance is no longer just for filtering instances.

  4. It's not shown, but I assume the GrafanaNotificationPolicyRoute is a full route object allowing for further nested routes, but the dynamic assembling is only possible at the top level spec.route.routes?
    routeSelector is only present on GrafanaNotificationPolicy
    It may be necessary to move spec.route.routeSelector to spec.routeSelector to re-use the Route struct for both CRDs

  5. I had a point here about the assembling of routes being deterministic and what the tiebreaker is when two priorities are the same.
    but Kubernetes guarantees ordered output according to:
    List objects in deterministic order kubernetes/kubernetes#3812
    kubectl get has random output order kubernetes/kubernetes#4236
    The resulting spec.route.routes should be idempotent if the inputs are the same.

  6. Should there be an upper/lower bound on the range of priority?
    1-100 and nil defaults to lowest priority of 100?

@msvechla
Copy link
Author

All the reconciliation logic would have to be implemented in the GrafanaNotificationPolicy controller.
Which is quite different from existing CRDs, but not a problem, there just would not be a reconciliation loop, outside maybe a status updates/finalizers.
The benefit of a structured config and CR Create/Update validations are probably worth it.

Thanks, this is good input. I did not look at implementation specifics yet, as I wanted to draft the high-level design first. Definitely makes sense.

allowCrossNamespaceImport could be inherited from GrafanaNotificationPolicy.
Meaning, a GrafanaNotificationPolicy with allowCrossNamespaceImport: true also imports matching routes from other namespaces.

Agreed, I'm also all for simplicity here.

Are you planning on re-using dynamic routes across multiple GrafanaNotificationPolicy CRs, and are there cases where this could not be achieved with the existing instanceSelector, hence the routeSelector?
An example or two of this would be great.
Alternatively, I would consider the possibility of re-using the instanceSelector criteria for also finding GrafanaNotificationPolicyRoute, this could be a mistake as then instance is no longer just for filtering instances.

I don't have such a use-case at the moment and also did not consider it in the design. Imhop we can start with using the instanceSelector and adding further selectors later on if required.

It's not shown, but I assume the GrafanaNotificationPolicyRoute is a full route object allowing for further nested routes, but the dynamic assembling is only possible at the top level spec.route.routes?

Yes correct, that was my idea.

routeSelector is only present on GrafanaNotificationPolicy
It may be necessary to move spec.route.routeSelector to spec.routeSelector to re-use the Route struct for both CRDs

Good point, yes we would have to move it to spec.routeSelector, but if we agree on using the instanceSelector, this can even be discarded entirely for now.

I had a point here about the assembling of routes being deterministic and what the tiebreaker is when two priorities are the same.
but Kubernetes guarantees ordered output according to:
kubernetes/kubernetes#3812
kubernetes/kubernetes#4236
The resulting spec.route.routes should be idempotent if the inputs are the same.

Thanks, noted! Definitely the resulting routes should be idempotent.

Should there be an upper/lower bound on the range of priority?
1-100 and nil defaults to lowest priority of 100?

I'm open for input here, but I think your suggestion makes sense.

So should we agree on using the instanceSelector for selecting possible GrafanaNotificationPolicyRoute resources to merge? Thanks a lot for your input!

@msvechla
Copy link
Author

Actually as I'm working on a draft implementation, I think it makes more sense to go with the routeSelector instead of re-using the instanceSelector on GrafanaNotificationPolicyRoute.

My reasoning behind this is the following:

  • during the reconcile loop in notificationpolicy_controller.go, we have to fetch all matching GrafanaNotificationPolicyRoutes for the currently reconciled GrafanaNotificationPolicy
  • this can be very easily achieved with a routeSelector, which will be a Kubernetes LabelSelector
  • if we would go with instanceSelector, we would have to fetch all available GrafanaNotificationPolicyRoutes and then do some filtering afterwards, to see if the instanceSelector matches, which would be both more inefficient and more complex

So I think adding an optional routeSelector to GrafanaNotificationPolicy makes more sense now.

Please let me know your thoughts of course, thanks a lot!

@msvechla
Copy link
Author

msvechla commented Dec 18, 2024

I updated the feature proposal here with what we discussed.

Also I have a draft PR that has a working implementation of this feature to support the proposal: #1800

Let me know what you think and thanks for your support!

@msvechla
Copy link
Author

msvechla commented Jan 9, 2025

I know it has been holiday time, but did someone already get a chance to take a look at this proposal and my draft implementation?

What would be the next steps?

Thanks a lot for your support!

@theSuess
Copy link
Member

Thanks again for all the work put into this, and sorry for the late reply, as you said, the holidays slowed things down quite a bit 😬

Overall, I think this looks very good. My two takes on this are:

I'd prefer to see the route selector as a property of the route object and make it mutually exclusive with the routes property for child routes.
This way, we don't have to concern ourselves with merge strategies for existing routes. It also enables more ergonomic definitions of policies by allowing users to insert dynamic routes at any desired level without requiring intermediate NotificationPolicyRoute resources.

Second, I'd skip the priority logic for now. As list operations are sorted alphabetically now, users with the need for specific ordering can always fall back to naming the resources accordingly. If it's a frequent request, we can consider the priority logic in the future but I'd like to keep things as simple as possible for the initial implementation

Another thing I'd just like to note which will be relevant to the implementation (no need to address this here, just want to write it down) is to find a way to avoid recursive routes.

We'll discuss this topic today in our weekly sync and will get back to you with the next steps

@theSuess theSuess added the documentation Issues relating to documentation, missing, non-clear etc. label Jan 13, 2025
@msvechla
Copy link
Author

msvechla commented Jan 13, 2025

Sounds good, thanks for reaching out! Looking forward to the outcome of your discussion 😊

I can work on the necessary adjustments, but it could help if you could maybe provide one example to illustrate the behaviour.

I'd prefer to see the route selector as a property of the route object and make it mutually exclusive with the routes property for child routes.
This way, we don't have to concern ourselves with merge strategies for existing routes. It also enables more ergonomic definitions of policies by allowing users to insert dynamic routes at any desired level without requiring intermediate NotificationPolicyRoute resources.

To me with the current design we do not really have to think about merging at all, however it will become a bigger factor if we move the routeSelector to the route object from my understanding.

If we move the routeSelector to the route object, it is not entirely clear to me how things should be merged and how we should deal with conflicts, which is not an issue with the current design as far as I can see.

Probably I do not fully understand what you have in mind, so an example will help clear this up 🙂

The current design only allows merging in on one level, which might not solve all requirements, so I understand your concern.

@theSuess
Copy link
Member

Sure, my idea would look like this:

apiVersion: grafana.integreatly.org/v1beta1
kind: GrafanaNotificationPolicy
metadata:
  name: grafananotificationpolicy-sample
spec:
  route:
    receiver: grafana-default-email
    group_by:
      - grafana_folder
      - alertname
    routes:
      - object_matchers:
          - - team
            - =
            - first
        routeSelector:
          team: first
      - object_matchers:
          - - team
            - =
            - second
        routeSelector:
          team: second

The teams can then add their NotificationPolicyRoutes with their team as a label, and they will end up in the right place

With regard to merging, in your example, the resulting tree has both inline and dynamic rules set. I'd propose to have them mutually exclusive. If you specify routeSelector, you are not allowed to set routes and vice versa.

Hope this makes sense!

@msvechla
Copy link
Author

With regard to merging, in your example, the resulting tree has both inline and dynamic rules set. I'd propose to have them mutually exclusive. If you specify routeSelector, you are not allowed to set routes and vice versa.

I would also be fine with adapting this on the current draft I posted, this is not a requirement for me, they could be mutually exclusive.

In the example you posted, this will then merge in NotificationPolicyRoutes which currently include the route object, so this would mean that teams could define routeSelectors there as well and create complex transitive references. Is this intended?

We discussed something similar to this above with @Baarsgaard

routeSelector is only present on GrafanaNotificationPolicy
It may be necessary to move spec.route.routeSelector to spec.routeSelector to re-use the Route struct for both CRDs

Also in your example, if team: first specifies object_matchers: in their NotificationPolicyRoute as well, how would it be merged with the existing

      - object_matchers:
          - - team
            - =
            - first
        routeSelector:
          team: first

Thanks for your help, maybe this can clear a few things up 🙂

@msvechla
Copy link
Author

@theSuess sorry for the ping, but could you elaborate on / clarify the open questions from my last post?
Thanks a lot again!

@theSuess
Copy link
Member

I do think that it should be possible to specify routeSelectors in the NotificationPolicyRoute as well.

The downside to this is, as you mentioned, self-referential lookup policies. I think we can avoid this by keeping track of whether the label selector has already appeared previously and disallowing any repetitions

@msvechla
Copy link
Author

I understand, but for me its still unclear how your example would render:

apiVersion: grafana.integreatly.org/v1beta1
kind: GrafanaNotificationPolicy
metadata:
  name: grafananotificationpolicy-sample
spec:
  route:
    receiver: grafana-default-email
    group_by:
      - grafana_folder
      - alertname
    routes:
      - object_matchers:
          - - team
            - =
            - first
        routeSelector:
          team: first
      - object_matchers:
          - - team
            - =
            - second
        routeSelector:
          team: second

lets imagine there is the following GrafanaNotificationPolicyRoute:

apiVersion: grafana.integreatly.org/v1beta1
kind: GrafanaNotificationPolicyRoute
metadata:
  name: grafananotificationpolicy-sample
  labels:
    team: first
spec:
  route:
    receiver: grafana-another-email
    object_matchers:
      - - custom
        - =
        - "matches"

What will the resulting NotificationPolicy look like? Should the re-declared object_matchers in the GrafanaNotificationPolicyRoute overwrite the object_matchers defined previously? Should they get merged?

Thanks for your help!

@theSuess
Copy link
Member

Ah I see what you mean. This is what I imagine:
image

So the route defined in the GrafanaNotificationPolicyRoute gets placed in the higher level route

@msvechla
Copy link
Author

msvechla commented Jan 20, 2025

Thanks for clarifying, so the routes from routeSelector should still be merged into the routes property to create child-routes.

I will try to adapt my draft MR with the changes you mentioned. Thanks for the input!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Issues relating to documentation, missing, non-clear etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants