-
Notifications
You must be signed in to change notification settings - Fork 801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add PeerDAS metrics to track subnets without peers #6928
base: unstable
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just added a comment.
beacon_node/network/src/metrics.rs
Outdated
"sync_column_subnets_with_zero_peers", | ||
"Current count of total column subnets with zero peers", | ||
) | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we just use sync_peers_per_column_subnet
for this?
and the query:
count(sync_peers_per_column_subnet == 0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I mentioned that in the description, we can skip the metric if we are ok doing a Grafana query
beacon_node/network/src/metrics.rs
Outdated
"sync_custody_column_subnets_with_zero_peers", | ||
"Current count of custody column subnets of this node with zero peers", | ||
) | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, could use sync_column_subnets_with_zero_peers
?
Removing from merge queue temporarily while we merge v7 PRs |
@mergify dequeue |
This pull request has been removed from the queue for the following reason: Pull request #6928 has been dequeued by a You should look at the reason for the failure and decide if the pull request needs to be fixed or if you want to requeue it. If you want to requeue this pull request, you need to post a comment with the text: |
✅ The pull request has been removed from the queue
|
Proposed Changes
Currently we track a key metric
PEERS_PER_COLUMN_SUBNET
in a gettergood_peers_on_sampling_subnets
. Another PR #6922 deletes that function, so we have to move the metric anyway. This PR moves that metric computation to the metrics spawned task which is refreshed every 5 seconds.I also added a few more useful metrics. The total set and intended usage is:
sync_peers_per_column_subnet
: Track health of overall subnets in your nodesync_peers_per_custody_column_subnet
: Track health of the subnets your node needs. We should track this metric closely in our dashboards with a heatmap and bar plotsync_column_subnets_with_zero_peers
: Is equivalent to the Grafana querycount(sync_peers_per_column_subnet == 0) by (instance)
. We may prefer to skip it, but I believe it's the most important metric as ifsync_column_subnets_with_zero_peers > 0
your node stalls.sync_custody_column_subnets_with_zero_peers
:count(sync_peers_per_custody_column_subnet == 0) by (instance)