Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PeerDAS metrics to track subnets without peers #6928

Open
wants to merge 2 commits into
base: unstable
Choose a base branch
from

Conversation

dapplion
Copy link
Collaborator

@dapplion dapplion commented Feb 6, 2025

Proposed Changes

Currently we track a key metric PEERS_PER_COLUMN_SUBNET in a getter good_peers_on_sampling_subnets. Another PR #6922 deletes that function, so we have to move the metric anyway. This PR moves that metric computation to the metrics spawned task which is refreshed every 5 seconds.

I also added a few more useful metrics. The total set and intended usage is:

  • sync_peers_per_column_subnet: Track health of overall subnets in your node
  • sync_peers_per_custody_column_subnet: Track health of the subnets your node needs. We should track this metric closely in our dashboards with a heatmap and bar plot
  • sync_column_subnets_with_zero_peers: Is equivalent to the Grafana query count(sync_peers_per_column_subnet == 0) by (instance). We may prefer to skip it, but I believe it's the most important metric as if sync_column_subnets_with_zero_peers > 0 your node stalls.
  • sync_custody_column_subnets_with_zero_peers: count(sync_peers_per_custody_column_subnet == 0) by (instance)

@dapplion dapplion requested a review from jxs as a code owner February 6, 2025 14:55
@dapplion dapplion requested a review from jimmygchen February 6, 2025 14:57
@dapplion dapplion added ready-for-review The code is ready for review das Data Availability Sampling labels Feb 6, 2025
Copy link
Member

@jimmygchen jimmygchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just added a comment.

"sync_column_subnets_with_zero_peers",
"Current count of total column subnets with zero peers",
)
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just use sync_peers_per_column_subnet for this?

and the query:

count(sync_peers_per_column_subnet == 0)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I mentioned that in the description, we can skip the metric if we are ok doing a Grafana query

"sync_custody_column_subnets_with_zero_peers",
"Current count of custody column subnets of this node with zero peers",
)
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, could use sync_column_subnets_with_zero_peers?

@jimmygchen jimmygchen added waiting-on-author The reviewer has suggested changes and awaits thier implementation. and removed ready-for-review The code is ready for review labels Feb 7, 2025
@jimmygchen jimmygchen added ready-for-merge This PR is ready to merge. and removed waiting-on-author The reviewer has suggested changes and awaits thier implementation. labels Feb 10, 2025
mergify bot added a commit that referenced this pull request Feb 10, 2025
@michaelsproul
Copy link
Member

Removing from merge queue temporarily while we merge v7 PRs

@michaelsproul michaelsproul added blocked and removed ready-for-merge This PR is ready to merge. labels Feb 10, 2025
@michaelsproul
Copy link
Member

@mergify dequeue

Copy link

mergify bot commented Feb 10, 2025

This pull request has been removed from the queue for the following reason: pull request dequeued.

Pull request #6928 has been dequeued by a dequeue command

You should look at the reason for the failure and decide if the pull request needs to be fixed or if you want to requeue it.

If you want to requeue this pull request, you need to post a comment with the text: @mergifyio requeue

Copy link

mergify bot commented Feb 10, 2025

dequeue

✅ The pull request has been removed from the queue default

@jimmygchen jimmygchen added ready-for-merge This PR is ready to merge. and removed blocked labels Feb 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
das Data Availability Sampling ready-for-merge This PR is ready to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants