Skip to content

Commit

Permalink
base/longhorn-system: improve alerts about backup jobs
Browse files Browse the repository at this point in the history
  • Loading branch information
paulfantom committed Oct 14, 2023
1 parent add67f1 commit 984c81c
Showing 1 changed file with 10 additions and 2 deletions.
12 changes: 10 additions & 2 deletions base/longhorn-system/prometheusrule.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -86,10 +86,18 @@ spec:
labels:
issue: Longhorn node {{$labels.node}} experiences high CPU pressure.
severity: warning
- alert: LonghornVolumeBackupStuck
expr: count by (volume) (longhorn_backup_state < 2)
for: 8h
labels:
severity: warning
annotations:
description: There are {{$value}} longhorn backups of a volume {{$labels.volume}} stuck for at least 8h.
summary: Longhorn backups stuck.
- alert: LonghornVolumeBackupError
expr: count by (backup, volume) (longhorn_backup_state > 2)
expr: count by (volume) (longhorn_backup_state > 3)
labels:
severity: warning
annotations:
description: Longhorn backup {{$labels.backup}} of a volume {{$labels.volume}} failed.
description: There are {{$value}} longhorn backups of a volume {{$labels.volume}} which failed to complete.
summary: Longhorn backups failed.

0 comments on commit 984c81c

Please sign in to comment.