Skip to content

Commit

Permalink
Adding metrics (#203)
Browse files Browse the repository at this point in the history
* adding inflight discovery metric

* adding metrics instructions and default dashboard

* spelling fixes
  • Loading branch information
dryajov authored Aug 23, 2022
1 parent 3d823dc commit 4bc7016
Show file tree
Hide file tree
Showing 7 changed files with 1,288 additions and 1 deletion.
10 changes: 9 additions & 1 deletion codex/blockexchange/engine/discovery.nim
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ import std/sequtils
import pkg/chronos
import pkg/chronicles
import pkg/libp2p
import pkg/metrics

import ../protobuf/presence

Expand All @@ -27,6 +28,8 @@ import ./pendingblocks
logScope:
topics = "codex discovery engine"

declareGauge(codex_inflight_discovery, "inflight discovery requests")

const
DefaultConcurrentDiscRequests = 10
DefaultConcurrentAdvertRequests = 10
Expand Down Expand Up @@ -104,12 +107,15 @@ proc advertiseTaskLoop(b: DiscoveryEngine) {.async.} =
continue

try:
trace "Advertising block", cid = $cid
let request = b.discovery.provide(cid)
b.inFlightAdvReqs[cid] = request
codex_inflight_discovery.set(b.inFlightAdvReqs.len.int64)
trace "Advertising block", cid = $cid, inflight = b.inFlightAdvReqs.len
await request
finally:
b.inFlightAdvReqs.del(cid)
codex_inflight_discovery.set(b.inFlightAdvReqs.len.int64)
trace "Advertised block", cid = $cid, inflight = b.inFlightAdvReqs.len
except CatchableError as exc:
trace "Exception in advertise task runner", exc = exc.msg

Expand Down Expand Up @@ -141,6 +147,7 @@ proc discoveryTaskLoop(b: DiscoveryEngine) {.async.} =
.wait(DefaultDiscoveryTimeout)

b.inFlightDiscReqs[cid] = request
codex_inflight_discovery.set(b.inFlightAdvReqs.len.int64)
let
peers = await request

Expand All @@ -149,6 +156,7 @@ proc discoveryTaskLoop(b: DiscoveryEngine) {.async.} =
await allFinished(peers.mapIt( b.network.dialPeer(it.data))))
finally:
b.inFlightDiscReqs.del(cid)
codex_inflight_discovery.set(b.inFlightAdvReqs.len.int64)
except CatchableError as exc:
trace "Exception in discovery task runner", exc = exc.msg

Expand Down
43 changes: 43 additions & 0 deletions metrics/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Codex Metrics and Dashboard

> This readme should help you to get started with collecting and visualizing metrics exposed by the Codex process.
## Metrics

Metrics are collected using the [nim-metrics](https://github.com/status-im/nim-metrics) backend and should be enabled with the `--metrics` flag. By default metrics are exposed on the `localhost:8008/metrics` end point.

Use the `--metrics-address` and `--metrics-port` flags to to adjust the address and port as necessary.

## General guidelines for adding new metrics

Metrics are useful to monitor the health of the process and should aid in identifying and debugging potential issues that would be hard to notice otherwise.

All Codex metrics should be prefixed with the `codex_` prefix to be able to differentiate from metrics exposed by other subsystems. For example libp2p generally prefixed with the `libp2p_` prefix.

Metrics can be added on an as needed basis, however, keep in mind the potential overhead they might introduce. In particular, be careful with labels as they will generate as many metrics as there are labels for a specific collector. If a metrics or a set of metrics are expensive, it is usually advisable to put them behind a compile time flag.

## Prometheus and Grafana

The exposed metrics can be aggregate by the [Prometheus](https://prometheus.io/) monitoring systems and additionally graphed through [Grafana](https://grafana.com/).

This directory contains both the default `prometheus.yml` config file as well as a basic `codex-grafana-dashboard.json` file that can be augmented with additional panels and metrics on an as needed basis.

Additionally, please consider installing the [node_exporter](https://github.com/prometheus/node_exporter) agent to collect machine level metrics such as overall memory, process, networking, disc IO, etc...

### Using the Grafana dashboard

To use the dashboard open grafana and head to `Dashboards`, hit import in the top rightmost button right next to the `New Dashboard` and `New Folder`.

![](assets/main.png)

This will take you to the import page.

![](assets/import.png)

Use either one of the presented methods (upload json, load from a url or copy paste the json into the text-box), to upload the `codex-grafana-dashboard.json` file.

Finally, you'll be presented with the following screen where you can change the name and the `UID` of the imported dashboard. This is only necessary if there is already a dashboard with the same name or `UID`.

![](./assets/imported.png)

Once imported, the dashboard should show up on the main dashboard page.
Binary file added metrics/assets/import.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added metrics/assets/imported.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added metrics/assets/main.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 4bc7016

Please sign in to comment.