To include the name of your application in all reported metrics, make sure to configure this as
early as possible before any meters are bound to your MeterRegistry
. E.g. configure this where you create your registry:
MeterRegistry registry = new PrometheusMeterRegistry(PrometheusConfig.DEFAULT);
MeterFilters.tryIncludeApplicationNameCommonTag().ifPresentOrElse(
config()::meterFilter,
() -> LOG.warn("Unable to include application common tag in MeterRegistry"));
This will try to determine the name of your application by resolving the value of the Implementation-Title
key in the
MANIFEST.MF
file of the JAR file containing the class which started your application.
You also have the option to provide a class yourself instead of relying on this being automatically discovered. The class should
be located in the JAR which also contains the MANIFEST.MF
which contains the Implementation-Title
you would like to use as
your application name.
The example above logs a warning should this discovery mechanism fail to resolve your application name. You may choose to handle this in any way depending on your preference, e.g. throw an exception instead of just logging.
You can also skip all this automatic discovery, and just supply the name of your application when configuring the filter:
MeterRegistry registry = new PrometheusMeterRegistry(PrometheusConfig.DEFAULT);
meterRegistry.config().meterFilter(MeterFilters.includeApplicationNameCommonTag("my-application"))
See PR #34 for more examples on how to configure the filter.
Usage with a MeterRegistry
:
new ApplicationInfoMetrics().bindTo(this);
Application metric with data from MANIFEST.MF
or environment variables.
This is what is expected to exist in the manifest or as key value environment variables:
Build-Jdk-Spec: 12
Git-Build-Time: 2019-12-19T22:52:05+0100
Git-Build-Version: 1.2.3
Git-Commit: ffb9099
This will create this metric in Prometheus running java 11:
# HELP app_info General build and runtime information about the application. This is a static value
# TYPE app_info gauge
app_info{application="my-application",buildNumber="ffb9099",buildTime="2019-12-19T22:52:05+0100",buildVersion="1.2.3",javaBuildVersion="12",javaVersion="11"} 1.0
(Note, application="my-application"
will only be included if you configured the "Common application tag MeterFilter" described previously.)
The following metric will be created if no values are present in the manifest or environment variables:
# HELP app_info General build and runtime information about the application. This is a static value
# TYPE app_info gauge
app_info{javaVersion="11"} 1.0
The SimplePrometheusServer
will take a log-method in its constructor to log when the server is up.
To start the server you need your instance of PrometheusMeterRegistry
and a port. The prometheus metrics will then be on host:port/metrics
new SimplePrometheusServer(LOG::info)
.startMetricsServer(prometheusRegistry, 9610);
With TimedThirdPartyCall
you can wrap your code to get metrics on the call with extended funtionality on top of what
micrometer Timed gives you.
An example:
BiFunction<MyResponse, Optional<RuntimeException>, AppStatus> warnOnSituation =
(response, possibleException) -> possibleException.isPresent() || "ERROR_SITUATION".equals(response.data) ? AppStatus.WARN : AppStatus.OK;
TimedThirdPartyCall<MyResponse> getStuff = TimedThirdPartyCallDescriptor
.create("ExternalService", "getStuff", prometheusRegistry)
.callResponseStatus(warnOnSituation);
getStuff.call(() -> new MyResponse("ERROR_SITUATION"));
This will produce a number of metrics:
app_third_party_call_total{name="ExternalService_getStuff", status="OK"} 0.0
app_third_party_call_total{name="ExternalService_getStuff", status="WARN"} 1.0
app_third_party_call_total{name="ExternalService_getStuff", status="FAILED"} 0.0
app_third_party_call_seconds_count{name="ExternalService_getStuff"} 1.0
app_third_party_call_seconds_sum{name="ExternalService_getStuff"} 6.6018E-5
app_third_party_call_seconds_max{name="ExternalService_getStuff"} 6.6018E-5
The idea is that Timed only count exections overall. What we want in addition is finer granularity to create better alerts
in our alerting rig. By specifying a function by witch we say OK/WARN/FAILED we can exclude error-situations
that we want to igore from alerts reacting to FAILED
or a percentage of FAILED/TOTAL
.
You can also use simple exception-mapper-function for a boolean OK/FAILED:
TimedThirdPartyCall<String> getStuff = TimedThirdPartyCallDescriptor
.create("ExternalService", "getStuff", prometheusRegistry)
.exceptionAsFailure();
String result = getStuff.call(() -> "OK");
For timing void
functions you can use NoResultTimedThirdPartyCall
,
acquired by invoking the noResult()
method:
NoResultTimedThirdPartyCall voidFunction = TimedThirdPartyCallDescriptor
.create("ExternalService", "voidFunction", prometheusRegistry)
.noResult() // allows timing void function calls
.exceptionAsFailure();
voidFunction.call(() -> {});
You can also defined percentiles (default 0.5, 0.95, 0.99):
TimedThirdPartyCallDescriptor
.create("ExternalService", "getStuff", prometheusRegistry)
.callResponseStatus(warnOnSituation, 0.95, 0.99);
Often you want to have metrics that might be slow to get. Examples of this is count rows in a Postgres-database or
maybe stats from a keystore. Typically you want to have som kind of worker threat that updates this
value on a regular basis. But how do you know that your worker thread is not stuck?
For this you can use the MetricsUpdater
class. Create an instance of it and specify the number of threads you want. Now
registrer a runnable at an interval.
metricsUpdater.registerAsyncUpdate("count-table", Duration.ofMinutes(10), () -> {
//Slow count of huge database
});
You can the alert if this is stale:
- alert: AsyncUpdateScrapeErrors
expr: app_async_update_scrape_errors > 0
for: 2m
Sometimes you want to have metrics for some event that happens in your application. And sometimes you want som kind of alert or warning when they occur at a given rate. This implementation is a way to achieve that in a generic way.
Your application need to implement the interface AppBusinessEvent
. We usually do that with an enum so that we have
easy access to the instance of the event. You can se a complete implementation of this in AppBusinessEventLoggerTest
.
You can also use the interface AppSensorEvent
to add a multiplier score (severity) to an event.
EventLogger eventLogger = new AppBusinessEventLogger(meterRegistry);
eventLogger.log(MyBusinessEvents.VIOLATION_WITH_WARN);
This should produce a prometheus scrape output like this:
# HELP app_business_events_1min_warn_thresholds
# TYPE app_business_events_1min_warn_thresholds gauge
app_business_events_1min_warn_thresholds{name="VIOLATION_WITH_WARN",} 5.0
# HELP app_business_events_total
# TYPE app_business_events_total counter
app_business_events_total{name="VIOLATION_WITH_WARN",} 1.0
You can then use the gauge app_business_events_1min_warn_thresholds
to register alerts with your system:
- alert: MyEvents
expr: >
sum by (job,name) (increase(app_business_events_total[5m]))
>=
max by (job,name) (app_business_events_1min_warn_thresholds) * 5
labels:
severity: warning
annotations:
summary: 'High event-count for `{{ $labels.name }}`'
description: >
Job: `{{ $labels.job }}`, event: `{{ $labels.name }}`, has 15min count of `{{ $value | printf "%.1f" }}`
The nice thing here is that by doing the sum by (job, name)
you will compare only the metrics with the same
name. For this eksample that is VIOLATION_WITH_WARN
which is your uniqe event name in the system.
Log-events metrics for specified logback appender. Dimensions for level and logger.
LogbackLoggerMetrics.forRootLogger().bindTo(meterRegistry);
//or
LogbackLoggerMetrics.forLogger("my.logger.name").bindTo(meterRegistry);
This will produce the following prometheus scrape output:
# HELP logback_logger_events_total
# TYPE logback_logger_events_total counter
logback_logger_events_total{application="my-application",level="warn",logger="ROOT",} 0.0
logback_logger_events_total{application="my-application",level="error",logger="ROOT",} 0.0
logback_logger_events_total{application="my-application",level="trace",logger="ROOT",} 0.0
logback_logger_events_total{application="my-application",level="info",logger="ROOT",} 18.0
logback_logger_events_total{application="my-application",level="debug",logger="ROOT",} 0.0
If you for some reasons don't want log events from a spesific logger to be included in the metric this can be done:
LogbackLoggerMetrics.forRootLogger()
.excludeLogger("ignored.logger.name")
.bindTo(prometheusRegistry);
Metrics for logging level threshold can also be created with the methods warnThreshold5min
and errorThreshold5min
.
Ex:
LogbackLoggerMetrics.forLogger(Logger.ROOT_LOGGER_NAME)
.warnThreshold5min(10)
.errorThreshold5min(5)
.bindTo(prometheusRegistry);
This will produce the following metrics during prometheus scraping, in addition to the metrics above:
# HELP logger_events_1min_threshold
# TYPE logger_events_1min_threshold gauge
logger_events_5min_threshold{application="my-application",level="warn",logger="ROOT",} 10.0
logger_events_5min_threshold{application="my-application",level="error",logger="ROOT",} 5.0
These metrics can be used for alerting in combination with the metrics above. Prometheus expression:
sum by (job,level,logger) (increase(logback_logger_events_total[5m]))
>=
max by (job,level,logger) (log_events_5min_threshold)