Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added SMI OTEL documentation #3038

Open
wants to merge 25 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
ba9e763
added SMI OTEL documentation
ls-spryker Jan 30, 2025
7499017
Create monitoring.md
romansprykee Jan 20, 2025
59a4bb5
Create spryker-monitoring-integration.md
romansprykee Jan 20, 2025
80840db
Update monitoring.md
romansprykee Jan 20, 2025
22a91dd
Update spryker-monitoring-integration.md
romansprykee Jan 20, 2025
6e4697f
Update monitoring.md
romansprykee Jan 20, 2025
2b0b34d
Update monitoring.md
romansprykee Jan 20, 2025
3369b72
Update monitoring.md
romansprykee Jan 20, 2025
d80ad5c
Update monitoring.md
romansprykee Jan 20, 2025
eb1048a
Update monitoring.md
romansprykee Jan 20, 2025
d11905b
Update monitoring.md
romansprykee Jan 20, 2025
ae437e3
Update configure-services.md
romansprykee Jan 20, 2025
a369e29
Update spryker-monitoring-integration.md
romansprykee Jan 20, 2025
f1273cd
Update configure-services.md
romansprykee Jan 20, 2025
38ddd69
Update spryker-monitoring-integration.md
romansprykee Jan 21, 2025
4a800b7
Update spryker-monitoring-integration.md
romansprykee Jan 21, 2025
37bbee6
Update configure-services.md
romansprykee Jan 21, 2025
deddd98
Update configure-services.md
romansprykee Jan 21, 2025
218d09a
Update monitoring.md
romansprykee Jan 22, 2025
78bec14
Update configure-services.md
romansprykee Jan 22, 2025
c5fb032
Update spryker-monitoring-integration.md
romansprykee Jan 30, 2025
eb76858
Update configure-services.md
romansprykee Jan 30, 2025
be8a2dc
formatting
ls-spryker Jan 31, 2025
626ebf4
intergrated health status metrics info into smi page
ls-spryker Jan 31, 2025
fce8711
intergrated health status metrics info into smi page
ls-spryker Jan 31, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions docs/ca/dev/monitoring.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Monitoring

Effective monitoring is crucial for maintaining the health and performance of your Spryker applications. This page provides access to resources and integrations that enable comprehensive monitoring through logs and application performance metrics.

## Logs with CloudWatch
CloudWatch offers robust logging capabilities, allowing you to track, store, and analyze logs from your Spryker applications and services. Learn more about [working with Logs](/docs/ca/dev/working-with-logs.md).

## Application Performance Monitoring
Application Performance Monitoring (APM) provides near real-time insights into the performance of your applications, helping you quickly identify and resolve issues. For Spryker customers, APM ensures optimal application health, enhancing the user experience by minimizing downtime and performance bottlenecks.
### Spryker Monitoring Integration (OTel)
Integrate Spryker monitoring data into your preferred APM tool using OpenTelemetry for flexible and comprehensive application performance monitoring. Learn more about [Spryker Monitoring Integration](/docs/ca/dev/spryker-monitoring-integration.md).

### New Relic APM
Leverage New Relic’s powerful APM features to monitor and troubleshoot your Spryker applications with ease. Learn more about how to use [New Relic APM with Spryker solutions](/docs/dg/dev/integrate-and-configure/configure-services.html#new-relic).

## Monitoring issues and informing about alerts
This section outlines the process for monitoring issues and managing alerts within the Spryker ecosystem. It provides guidance on configuring alerting systems, responding to incidents, and ensuring smooth communication during operational disruptions. For detailed instructions and best practices, check out the [full guide here](/docs/ca/dev/monitoring-issues-and-informing-about-alerts.md).

25 changes: 25 additions & 0 deletions docs/ca/dev/smi-health-status-metrics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Overview
As part of Spryker's introduced Open Telemetry (OTEL) initiative, we offer Spryker's Metrics Integration (SMI), a set of service health metrics generated for our customers. These metrics provide a high-level view into the health status of enabled services. The level of detail is as follows:

## Metrics Without Dimensions
The following metrics return a gauge with a binary value: 1 (Green) or 0 (Red). They are composites of multiple service-related signals that, after transformations and calculations, yield a single value indicating the service's health.

- **hc_rds**: Reports 0 or 1 for the overall health of the RDS Service.
- **hc_jenkins**: Reports 0 or 1 for the overall health of the Jenkins Service.
- **hc_rabbitmq**: Reports 0 or 1 for RabbitMQ health.

## Metrics with Dimensions
These metrics can be split by the following dimensions/labels*:

- **hc_rabbitmq_message_count_sum**: A count of RabbitMQ messages [dimension_queue, dimension_virtualhost].
- **hc_jenkins_builds_success_build_count_total.count**: A count of successful Jenkins jobs [jenkins_job].
- **hc_jenkins_builds_failed_build_count_total.count**: A count of failed Jenkins jobs [jenkins_job].
- **hc_tasks_cpu_average**: CPU utilization in % for cluster tasks [dimension_clustername, dimension_servicename].
- **hc_tasks_memory_utilization**: Memory utilization in % for cluster tasks [dimension_clustername, dimension_servicename].

## Metric Details
The default metrics resolution is 60 seconds. All metrics can be split by telemetry-data-account.

## Terminology
In this document, Labels/Dimensions/Attributes in relation to metrics are used interchangeably. The terminology depends on the customer's solution choice.
For example in Grafana, we would use the term labels, while in Dynatrace, we use dimensions, etc.
35 changes: 35 additions & 0 deletions docs/ca/dev/spryker-monitoring-integration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Spryker Monitoring Integration
The Spryker Monitoring Integration is a comprehensive product capability designed to empower customers with advanced monitoring for their applications and systems. Leveraging [OpenTelemetry](https://opentelemetry.io/), this solution enables seamless connectivity and data forwarding of telemetry data, including traces and health status metrics, to OpenTelemetry-compatible monitoring platforms. This integration facilitates near real-time tracking of application performance and monitoring of system health status.

## What is OpenTelemetry (OTel)
OpenTelemetry is an open-source framework that provides APIs, libraries, and agents for collecting traces and metrics across various applications. It standardizes the instrumentation of software to help developers monitor and improve application performance effectively. OTel allows to provide a seamless and vendor-agnostic monitoring experience, empowering customers to integrate Spryker with their preferred APM solutions while adhering to industry best practices for collecting and analyzing performance data.

## Telemetry data in scope of Spryker Monitoring Integration
The Spryker Monitoring Integration focuses on several key entities to provide comprehensive monitoring:
- **Traces and Spans**: In OpenTelemetry, a **trace** represents the journey of a single request or transaction as it moves through various components of a system, capturing the end-to-end flow. A **span** is a single operation or unit of work within a trace, containing information like the operation name, start and end times, and any relevant metadata. Together, traces and spans provide a detailed view of the interactions and performance of different parts of an application, helping to diagnose issues and optimize performance.
- **Health Status Metrics**: Monitoring the overall health of critical backing services such as the SQL Database, Message Broker, Scheduler, and key SCOS Services. This ensures continuous insight into the stability and performance of the system components. To learn more about health status metrics check out [Health Status Metrics](/docs/ca/dev/smi-health-status-metrics.md) page.

## How do I get it?
### Prerequisites
As a prerequisite, customers need to have an OpenTelemetry-compatible APM tool, which can be selected from the list of [supported vendors](https://opentelemetry.io/ecosystem/vendors/). <br>
Customers cannot use vendor-specific agents (e.g., Dynatrace agent) as part of this approach. Instead, Dynatrace and similar platforms will ingest telemetry data streamed via the OpenTelemetry Collector, which acts as a vendor-agnostic data pipeline. These platforms often provide additional proprietary features beyond raw data visualization. While vendors like DataDog, Dynatrace, and New Relic offer their own agents that are deeply integrated with their platforms and optimized for seamless data ingestion, our approach uses the OpenTelemetry Collector to remain vendor-neutral and support a wide range of APM solutions

### How to Request Spryker Monitoring Integration
To request the Spryker Monitoring Integration, customers need to submit a Change Request through the Support Portal. Follow these steps:

- Submit a Change Request: Access the Support Portal and create a new Change Request.
- Provide the following Information:
- Endpoint: The endpoint URL of your APM tool.
- Token: An API token that Spryker can use to configure and communicate with your APM tool.

Spryker Support will guide you through the setup process once the request is submitted.

### Instrumenting Your Application
To send telemetry data to your APM tool, your application must be instrumented using OpenTelemetry. This process ensures that the necessary data is collected and forwarded to the monitoring system of your choice.
Customers can self-serve the instrumentation by following the [instrumentation guide](/docs/dg/dev/integrate-and-configure/configure-services.md#how-to-instrument), but Spryker also offers expert services to assist with this setup. If you require professional support, please contact your sales representative for further assistance.

> [!NOTE]
>This solution only supports the **OpenTelemetry Collector** for telemetry ingestion. **Proprietary vendor agents (e.g., Dynatrace, DataDog, or New Relic agents) are not supported**. Instead, these platforms ingest telemetry streamed through the OpenTelemetry Collector, ensuring flexibility, interoperability, and vendor >neutrality while adhering to industry-standard observability practices

## Additional information
For more information, check out our [Spryker Service Description](https://spryker.com/ssd/).
20 changes: 20 additions & 0 deletions docs/dg/dev/integrate-and-configure/configure-services.md
Original file line number Diff line number Diff line change
Expand Up @@ -396,6 +396,9 @@ services:

[Blackfire](https://blackfire.io/) is a tool used to profile, test, debug, and optimize the performance of PHP applications. It gathers data about consumed server resources like memory, CPU time, and I/O operations. The data and configuration can be checked through the Blackfire web interface.

> [!IMPORTANT]
> While we recommend Blackfire for PHP code profiling, it's important to note that **Blackfire is currently not compatible with OpenTelemetry (OTel)**. If OpenTelemetry resolves this compatibility in the future, we will inform our customers accordingly. In the meantime, for profiling needs compatible with OTel, we suggest using tools like [Tideways](/docs/dg/dev/integrate-and-configure/configure-services.md#tideways), which can integrate seamlessly with your OpenTelemetry-based monitoring stack.

### Configure Blackfire

To enable Blackfire, follow these steps:
Expand Down Expand Up @@ -470,6 +473,23 @@ It is not obligatory to pass all the details as environment variables or define

{% endinfo_block %}

## OpenTelemetry (via Spryker Monitoring Integration)

### Prerequisites
In order to successfully integrate your Spryker solution with an Otel-compatible APM tool, you must first follow [this guide](/docs/ca/dev/spryker-monitoring-integration.md#how-do-i-get-it).

### How to instrument
We've created a comprehensive guide to help you instrument your application using OpenTelemetry. By following these instructions, you can gain valuable insights into your application's performance and ensure a robust monitoring setup. To start instrumenting your application, check out our detailed [OpenTelemetry Instrumentation Guide](/docs/dg/dev/backend-development/opentelemetry/overview.md#integration).

### What is included

#### Application Performance Monitoring (Platform)

#### Health Metrics (with examples) (Cloud)

- Explanation about what I need to do in my APM (e.g. Dynatrace) (Cloud+Platform)
-- Refer to the docs of the APM tool

## New Relic

[New Relic](https://newrelic.com/) is a tool used to track the performance of services and the environment to quickly find and fix issues.
Expand Down
Loading