-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add leniency to disk thresholds of riemann-health
#282
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
smortex
changed the title
Add leniency to disk thresholds
Add leniency to disk thresholds of Jan 21, 2024
riemann-health
smortex
force-pushed
the
disk-threshold-leniency
branch
from
January 21, 2024 19:42
e37f48b
to
9176cd7
Compare
smortex
force-pushed
the
disk-threshold-leniency
branch
from
January 22, 2024 00:08
9176cd7
to
87b87bd
Compare
The default warning / critical limits for disk occupation do not scale well for large volume: the default configuration for a 10 TB disk should not raise a warning when 90% of it is used and 1 TB is still available. Add a unit test that show the expected behavior.
Disk thresholds as a fraction of their usage does not scale well with modern disks: on one hand a 90% full partition that store logs is generaly an issue and should be reported, but in the other hand when a huge volume is available for storing backups (e.g. 10TB) the 90% usage limit does not really make sense as we do not want to waste 1TB of disk space. Introduce two new parameters to tune disk usage thresholds: - `--disk-warning-leninency` (default: 500G) - `--disk-critical-leninency` (default: 250G) When the fraction of disk space used reach a warning / critical threshold, check the available space against these "leninency" values, and only report the warning / critical status if the available space is lower than this limit. The defaults values have been chosen to be high enough to have an effect only for disks lager than 5TB. According to IEEE Std 1003.1-2017, a POSIX compliant `df(1)` must support the `-k` flag to return sizes in kB instead of the default that used to be 512-bytes (still in effect by default on FreeBSD but not on Linux). We use this flag on all systems to make sure the output is in 1024-bytes unit regardless of the operating system. Existing unit tests are updated accordingly.
smortex
force-pushed
the
disk-threshold-leniency
branch
from
January 22, 2024 23:30
9d119e3
to
6fce7e6
Compare
Now that we take free space into account, adding it to the message make sense.
smortex
force-pushed
the
disk-threshold-leniency
branch
from
January 22, 2024 23:37
effdefe
to
b002378
Compare
I think this is ready for review. As a non-native English speaker, it was quite hard for me to express this notion of "leniency" (tolerance). If you think of a better naming, I will be happy to update the PR accordingly. |
jamtur01
approved these changes
Jan 26, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this makes sense to me.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Disk thresholds as a fraction of their usage does not scale well with
modern disks: on one hand a 90% full partition that store logs is
generaly an issue and should be reported, but in the other hand when a huge
volume is available for storing backups (e.g. 10TB) the 90% usage limit
does not really make sense as we do not want to waste 1TB of disk space.
Introduce two new parameters to tune disk usage thresholds:
--disk-warning-leninency
(default: 500G)--disk-critical-leninency
(default: 250G)When the fraction of disk space used reach a warning / critical
threshold, check the available space against these "leninency" values,
and only report the warning / critical status if the available space is
lower than this limit.
The defaults values have been chosen to be high enough to have an effect
only for disks lager than 5TB.
According to IEEE Std 1003.1-2017, a POSIX compliant
df(1)
mustsupport the
-k
flag to return sizes in kB instead of the default thatused to be 512-bytes (still in effect by default on FreeBSD but not on
Linux). We use this flag on all systems to make sure the output is in
1024-bytes unit regardless of the operating system. Existing unit tests
are updated accordingly.