Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/notification features #341

Merged
merged 3 commits into from
Feb 20, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions canary-checker/docs/scripting/cel.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,26 @@ o.a.?d.orValue('fallback value') => 'fallback value'

You can read more about [or](#or) and [orValue](#orvalue) below.

## matchQuery

`matchQuery` matches a given resource against a search query.

Syntax:

```javascript
matchQuery(r, s)

// Where
// r = resource
// s = search query
```

Example:
```javascript
matchQuery(.config, "type=Kubernetes::Pod")
matchQuery(.config, "type=Kubernetes::Pod tags.cluster=homelab")
```

## aws

### aws.arnToMap
Expand Down
8 changes: 7 additions & 1 deletion common/src/components/Fields.jsx
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ const schemes = {
resourceselectors: '[[]ResourceSelector](/reference/resource-selector)',
connection: '[Connection](/reference/connections)',
string: '`string`',
"[]string": '`[]string`',
icon: '[Icon](/reference/types#icon)',
bool: '`boolean`',
int: '`integer`',
Expand All @@ -37,11 +38,16 @@ function useSchemeUrl(value) {
return "string"
}

value = schemes[value.toLowerCase()]
const key = value.toLowerCase();
if (!(key in schemes)) {
return value;
}

value = schemes[key]
if (value == null || !value.includes('](/')) {
return value
}

// Extract link text and URL
const matches = value.match(/\[(.*?)\]\((.*?)\)/);
if (matches) {
Expand Down
37 changes: 32 additions & 5 deletions mission-control/docs/guide/notifications/concepts/wait-for.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,11 @@ If alerts are configured for `config.unhealthy` events, these transient state fl
To address this issue, you can utilize the waitFor parameter.
This feature allows you to define a delay before sending notifications for specific events.
After an event occurs, the system rechecks its status following the specified wait period. Only if the undesired state persists does a notification trigger.

:::info
`waitFor` is only applicable on health related events
`waitFor` is only applicable to notifications of health related events
:::

This approach helps reduce unnecessary notifications caused by transient state changes, ensuring you're alerted only to persistent issues.


```yaml title='notify-unhealthy-deployments.yaml' {8}
apiVersion: mission-control.flanksource.com/v1
Expand All @@ -33,9 +32,13 @@ spec:
```

:::warning Handling Scrape Lag
`waitFor` re-evaluates the health based on the current state in config-db, in some circumstances there can be a lag between when a change occurs and the state reflects in config-db which can lead to false positives.

`waitForEvalPeriod` forces an incremental scrape of the resource before sending a notification, it waits for up to this period for a scrape to complete before sending a notification.
`waitFor` re-evaluates the health based on the current state in config-db.
However, in some circumstances, there may be a delay between when a change occurs and when it's refelected in config-db,
potentially resulting in false positives.

`waitForEvalPeriod` forces an incremental scrape of the resource before sending a notification.
It waits for up to this period for a scrape to complete before sending a notification.

```yaml title=waitForEvalPeriod.yaml
apiVersion: mission-control.flanksource.com/v1
Expand All @@ -49,5 +52,29 @@ spec:
//highlight-next-line
waitForEvalPeriod: 30s
```
:::

### Grouping Notifications

Multiple related notifications may be generated within a short time window. Instead of sending each alert separately,
you can use notification grouping to consolidate multiple events into a single message.

_Example_: When a Kubernetes deployment becomes unhealthy, its replicaset and associated pods will also become unhealthy.
If you have a notification set up to alert on `config.unhealthy`, you'll receive 3 different notifications at the very least for the same cause.

The `groupBy` parameter allows you to define how notifications should be grouped.
Grouping can be done via
- `type` (type of the config)
- `description`
- `status_reason`
- `labels` in the format `labels:app`
- `tags` in the format `tag:namespace`

:::info
Grouping only works with waitFor.
Hence, a waitFor duration is required
:::


```yaml title="" file=<rootDir>/modules/mission-control/fixtures/notifications/config-health.yaml {11-12}
```
14 changes: 13 additions & 1 deletion mission-control/docs/guide/notifications/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,18 @@ In this example:
- This expression evaluates to true only when the `type` attribute of the [check](/reference/canary-checker/check) object is equal to `'http'`.
- So, Mission Control only sends notifications for events that meet this condition, which alerts you when HTTP checks fail.

<details summary="Sophisticated Filters">
For a more sophisticated filter, you can also use the handy [`matchQuery`](/reference/scripting/cel#matchquery) cel function that supports search query.
Example:

```yaml title="config.yaml" file=<rootDir>/modules/mission-control/fixtures/notifications/config-health-match-query.yaml {9}
```

In this example, the filter matches all unhealthy or warning deployments and pods in the prod cluster except for the postgres deployment.

</details>


## Triggering Playbooks

Playbooks can be configured as recipients for notifications, allowing you to trigger automated workflows instead of sending notifications to traditional channels like email or Slack.
Expand All @@ -111,4 +123,4 @@ The example shows two notifications: `check-alerts` and `homelab-config-health-a
The group has `playbook:run` permission, which both notifications inherit.

```yaml title="permission.yaml" file=<rootDir>/modules/mission-control/fixtures/permissions/config-notification-group-playbook-permission.yaml
```
```
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,11 @@
description: "an additional delay after WaitFor before evaluating Kubernetes config health",
scheme: "duration"
},
{
field: "groupBy",
description: "Group notifications that are in waiting stage based on labels, tags and attributes. Only applicable when `waitFor` is provided. See [Grouping attributes](../../guide/notifications/concepts/wait-for#grouping-notifications)",
scheme: "[]string"
},
{
field: "title",
description: "Channel dependent e.g. subject for email",
Expand Down
2 changes: 1 addition & 1 deletion modules/canary-checker
Submodule canary-checker updated 71 files
+2 −2 .github/workflows/aws-exec.yml
+7 −7 .github/workflows/e2e-operator.yml
+1 −1 .github/workflows/gotest.yml
+3 −5 .github/workflows/lint.yml
+1 −1 .github/workflows/release.yml
+2 −2 .github/workflows/test.yml
+0 −2 .golangci.yml
+5 −21 Makefile
+2 −1 api/v1/checks.go
+1 −0 api/v1/component_types.go
+9 −9 api/v1/system_types.go
+1 −0 chart/crds/Canary.yml
+1 −0 chart/crds/Component.yml
+1 −0 chart/crds/Topology.yml
+0 −1 chart/crds/crd.yaml
+1 −1 checks/http.go
+5 −3 checks/s3.go
+1 −1 checks/sql.go
+7,900 −0 config/deploy/Canary.yml
+475 −0 config/deploy/Component.yml
+1,244 −0 config/deploy/Topology.yml
+0 −15,272 config/deploy/crd.yaml
+985 −6,638 config/deploy/manifests.yaml
+3 −1 config/kustomization.yaml
+106 −0 config/schemas/canary.schema.json
+109 −0 config/schemas/component.schema.json
+103 −0 config/schemas/health_exec.schema.json
+3 −0 config/schemas/health_s3.schema.json
+109 −0 config/schemas/topology.schema.json
+13 −13 fixtures/aws/aws_config_pass.yaml
+14 −13 fixtures/aws/cloudwatch_pass.yaml
+3 −2 fixtures/azure/devops.yaml
+0 −1 fixtures/datasources/mongo_fail.yaml
+0 −1 fixtures/datasources/mongo_pass.yaml
+0 −1 fixtures/datasources/mssql_fail.yaml
+1 −2 fixtures/datasources/mssql_pass.yaml
+0 −1 fixtures/datasources/mysql_fail.yaml
+0 −1 fixtures/datasources/mysql_pass.yaml
+0 −1 fixtures/datasources/posgres_stateful_pass.yaml
+0 −1 fixtures/datasources/postgres_pass.yaml
+0 −1 fixtures/datasources/redis_fail.yaml
+0 −1 fixtures/datasources/redis_pass.yaml
+0 −1 fixtures/datasources/s3_bucket_pass.yaml
+1 −1 fixtures/elasticsearch/stateful_metrics.yaml
+4 −3 fixtures/external/dynatrace.yaml
+0 −1 fixtures/git/git_pull_push_pass.yaml
+0 −1 fixtures/k8s/cronjob_monitor.yaml
+0 −1 fixtures/k8s/cronjob_monitor_fail.yaml
+0 −1 fixtures/k8s/http_auth_configmap.yaml
+2 −2 fixtures/k8s/http_auth_sa.yaml
+0 −1 fixtures/k8s/http_auth_secret.yaml
+2 −2 fixtures/k8s/kubernetes_pass.yaml
+2 −3 fixtures/k8s/kubernetes_resource_ingress_pass.yaml
+2 −3 fixtures/k8s/kubernetes_resource_namespace_pass.yaml
+0 −1 fixtures/k8s/kubernetes_resource_pod_exit_code_pass.yaml
+0 −1 fixtures/k8s/kubernetes_resource_service_fail.yaml
+0 −1 fixtures/minimal/http_auth_url_pass.yaml
+2 −1 fixtures/minimal/http_template.yaml
+2 −1 fixtures/minimal/icmp_fail.yaml
+3 −3 fixtures/minimal/jmeter.yaml
+0 −1 fixtures/minimal/namespaced_check_pass.yaml
+1 −1 fixtures/minimal/tcp.yaml
+0 −1 fixtures/opensearch/opensearch_fail.yaml
+0 −1 fixtures/opensearch/opensearch_pass.yaml
+52 −18 go.mod
+135 −41 go.sum
+62 −0 hack/compress-crds.sh
+1 −1 pkg/db/canary.go
+6 −1 pkg/metrics/metrics.go
+1 −0 pkg/system_api.go
+14 −1 test/e2e-operator.sh
2 changes: 1 addition & 1 deletion modules/duty
Submodule duty updated 115 files
2 changes: 1 addition & 1 deletion modules/mission-control
Submodule mission-control updated 84 files
+1 −1 .github/workflows/lint.yml
+3 −2 agent/agent.go
+63 −11 api/v1/connection_types.go
+32 −1 api/v1/notification_types.go
+7 −2 api/v1/permission_group_types.go
+50 −46 api/v1/permission_types.go
+17 −15 api/v1/playbook_types.go
+86 −16 api/v1/zz_generated.deepcopy.go
+1 −1 artifacts/controllers.go
+4 −3 auth/clerk_client.go
+1 −1 auth/controllers.go
+9 −6 auth/middleware.go
+2 −2 auth/tokens.go
+1 −1 catalog/controllers.go
+6 −1 cmd/root.go
+3 −3 cmd/server.go
+3 −2 cmd/token.go
+384 −4 config/crds/mission-control.flanksource.com_connections.yaml
+7 −3 config/crds/mission-control.flanksource.com_notifications.yaml
+4 −0 config/crds/mission-control.flanksource.com_permissiongroups.yaml
+66 −18 config/crds/mission-control.flanksource.com_permissions.yaml
+314 −0 config/crds/mission-control.flanksource.com_playbooks.yaml
+90 −10 config/schemas/connection.schema.json
+6 −6 config/schemas/notification.schema.json
+9 −0 config/schemas/permission.schema.json
+103 −0 config/schemas/playbook-spec.schema.json
+103 −0 config/schemas/playbook.schema.json
+1 −1 connection/controllers.go
+34 −3 db/connections.go
+2 −3 db/incidents.go
+8 −3 db/notifications.go
+2 −2 echo/kube_config_download.go
+4 −3 echo/people.go
+4 −3 echo/serve.go
+20 −0 fixtures/connections/awskms.yaml
+14 −0 fixtures/connections/gcpkms.yaml
+11 −0 fixtures/notifications/config-health-match-query.yaml
+14 −0 fixtures/notifications/config-health.yaml
+1 −1 fixtures/permissions/config-notification-group-playbook-permission.yaml
+25 −0 fixtures/playbooks/http-secret-parameter.yaml
+3 −0 fixtures/playbooks/params.yaml
+33 −23 go.mod
+84 −46 go.sum
+4 −0 jobs/jobs.go
+36 −16 notification/cel.go
+82 −0 notification/cel_test.go
+4 −0 notification/context.go
+1 −1 notification/controllers.go
+87 −6 notification/events.go
+185 −23 notification/job.go
+137 −1 notification/notification_test.go
+38 −5 notification/send.go
+2 −2 notification/templates/check.failed
+2 −2 notification/templates/check.passed
+29 −20 pkg/clients/git/connectors/connectors.go
+9 −9 pkg/clients/git/connectors/connectors_test.go
+35 −11 pkg/clients/git/connectors/git_access_token.go
+41 −33 playbook/actions/actions.go
+171 −0 playbook/actions/actions_test.go
+33 −9 playbook/actions/ai.go
+8 −0 playbook/actions/gitops.go
+1 −1 playbook/actions/gitops_test.go
+2 −2 playbook/approval.go
+5 −4 playbook/controllers.go
+38 −8 playbook/params.go
+12 −10 playbook/playbook.go
+1 −1 playbook/playbook_test.go
+12 −12 playbook/runner/agent.go
+6 −6 playbook/runner/runner.go
+29 −10 playbook/runner/template.go
+23 −14 rbac/adapter/permission.go
+10 −4 rbac/controllers.go
+0 −142 rbac/custom_functions.go
+0 −137 rbac/custom_functions_test.go
+0 −199 rbac/init.go
+8 −37 rbac/middleware.go
+0 −14 rbac/model.ini
+0 −201 rbac/objects.go
+0 −52 rbac/policies.yaml
+0 −205 rbac/policy/policy.go
+7 −70 rbac/rbac_test.go
+1 −1 snapshot/controllers.go
+6 −5 tests/middleware_test.go
+1 −1 upstream/controllers.go
2 changes: 1 addition & 1 deletion modules/mission-control-registry
Submodule mission-control-registry updated 28 files
+1 −1 charts/mission-control/Chart.yaml
+41 −6 charts/mission-control/templates/mission-control.yaml
+1 −1 charts/playbooks-ai/Chart.yaml
+21 −12 charts/playbooks-ai/README.md
+7 −4 charts/playbooks-ai/templates/diagnose-resource.yaml
+7 −4 charts/playbooks-ai/templates/diagnose-slack-notification.yaml
+29 −0 charts/playbooks-ai/templates/notification.yaml
+58 −0 charts/playbooks-ai/templates/recommend-playbooks.yaml
+101 −2 charts/playbooks-ai/values.schema.json
+97 −0 charts/playbooks-ai/values.yaml
+1 −1 charts/playbooks-kubernetes/Chart.yaml
+0 −3 charts/playbooks-kubernetes/Makefile
+28 −26 charts/playbooks-kubernetes/README.md
+11 −0 charts/playbooks-kubernetes/templates/_helpers.tpl
+1 −2 charts/playbooks-kubernetes/templates/cleanup-failed-pods.yaml
+1 −2 charts/playbooks-kubernetes/templates/create-deployment.yaml
+1 −2 charts/playbooks-kubernetes/templates/delete.yaml
+1 −2 charts/playbooks-kubernetes/templates/deploy-helm-chart.yaml
+1 −2 charts/playbooks-kubernetes/templates/ignore.yaml
+1 −2 charts/playbooks-kubernetes/templates/kubectl-logs.yaml
+1 −2 charts/playbooks-kubernetes/templates/pod-snapshot.yaml
+2 −4 charts/playbooks-kubernetes/templates/request-namespace-access.yaml
+1 −2 charts/playbooks-kubernetes/templates/restart.yaml
+1 −2 charts/playbooks-kubernetes/templates/scale.yaml
+1 −2 charts/playbooks-kubernetes/templates/update-image.yaml
+1 −2 charts/playbooks-kubernetes/templates/update-resource.yaml
+22 −0 charts/playbooks-kubernetes/values.schema.json
+22 −0 charts/playbooks-kubernetes/values.yaml