The objective of this document is to establish a common understanding of the monitoring architecture of the SCS stack. This document refers to various roles. These are defined in the overall role definitions.
The term monitoring is used to describe methods that enable Anomaly Detection, provide Operational Visibility and allow Capacity Planning.
Furthermore it is being distinguished between whitebox monitoring and blackbox monitoring.
Intent | Whitebox Monitoring | Blackbox Monitoring |
---|---|---|
Capacity Planning | x | - |
Anomaly Detection | x | x |
Operational Visibility | x | x |
The term monitoring includes:
- Healthcheck data (state)
- Telemetry (metrics)
- Centralized Log aggregation
There are various roles within the SCS scope that interact from different viewpoints with the monitoring (data). While the operator needs to be able to have a full stack view, the supporter looks from a slightly different viewpoint upon the monitoring.
Role | Capacity Planning | Anomaly Detection | Operational Visibility |
---|---|---|---|
Operator (Provider) | x | x | x |
Supporter (Provider) | - | (x) | x |
Integrator | (x) | x | (x) |
Developer | - | x | - |
Aside from these roles there are further cases that will use data aggregated as part of the monitoring:
- Provider Invoicing will need telemetry on usage data in order to provide billing.
- The SCS vendor will need anonymized usage data on the overall SCS stack adoption.
Wether the source are logs, metrics or health checks alerts need to be aggregated and transported via the alert routing to the provider-specific alerting engine. This allows the flexibility of each Provider coming with their specific alerting engine while keeping the alert routing within the SCS stack standardised.
Monitoring happens on two layers: infrastructure layer and the container layer. This is illustrated in the overview diagram. Each layer has their own set of components to assure independence.