-
Notifications
You must be signed in to change notification settings - Fork 678
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Per-chunk metrics #12758
Labels
A-observability
Area: observability
C-enhancement
Category: An issue proposing an enhancement or a PR with one.
Comments
Could you add high level engineering cost? |
I'd say about a week of work to get the basic functionality going. Things like integration with debug-ui or custom aggregators can be added later. |
github-merge-queue bot
pushed a commit
that referenced
this issue
Feb 7, 2025
This is the first step towards per-chunk metrics (#12758). This PR adds a new struct - `ChunkApplyStats` - which keeps information about things that happened during chunk application. For example how many transactions there were, how many receipts, what were the outgoing limits, how many receipts were forwarded, buffered, etc, etc. For now `ChunkApplyStats` contain mainly data relevant to the bandwidth scheduler, in the future more stats can be added to measure other things that we're interested in. I didn't want to add too much stuff at once to keep the PR size reasonable. There was already a struct called `ApplyStats`, but it was used only for the balance checker. I replaced it with `BalanceStats` inside `ChunkApplyStats`. `ChunkApplyStats` are returned in `ApplyChunkResult` and saved to the database for later use. A new database column is added to keep the chunk application stats. The column is included in the standard garbage collection logic to keep the size of saved data reasonable. Running `neard view-state chunk-apply-stats` allows node operator to view chunk application stats for a given chunk. Example output for a mainnet chunk: <details> <summary> Click to expand </summary> ```rust $ ./neard view-state chunk-apply-stats --block-hash GKzyP7DVNw5ctUcBhRRkABMaC2giNSKK5oHCrRc9hnXH --shard-id 0 ... V0( ChunkApplyStatsV0 { height: 138121896, shard_id: 0, is_chunk_missing: false, transactions_num: 35, incoming_receipts_num: 103, receipt_sink: ReceiptSinkStats { outgoing_limits: { 0: OutgoingLimitStats { size: 102400, gas: 18446744073709551615, }, 1: OutgoingLimitStats { size: 4718592, gas: 300000000000000000, }, 2: OutgoingLimitStats { size: 102400, gas: 300000000000000000, }, 3: OutgoingLimitStats { size: 102400, gas: 300000000000000000, }, 4: OutgoingLimitStats { size: 102400, gas: 300000000000000000, }, 5: OutgoingLimitStats { size: 102400, gas: 300000000000000000, }, }, forwarded_receipts: { 0: ReceiptsStats { num: 24, total_size: 6801, total_gas: 515985143008901, }, 2: ReceiptsStats { num: 21, total_size: 6962, total_gas: 639171080456467, }, 3: ReceiptsStats { num: 58, total_size: 17843, total_gas: 1213382619794847, }, 4: ReceiptsStats { num: 20, total_size: 6278, total_gas: 235098003759589, }, 5: ReceiptsStats { num: 4, total_size: 2089, total_gas: 245101556851946, }, }, buffered_receipts: {}, final_outgoing_buffers: { 0: ReceiptsStats { num: 0, total_size: 0, total_gas: 0, }, 2: ReceiptsStats { num: 0, total_size: 0, total_gas: 0, }, 3: ReceiptsStats { num: 0, total_size: 0, total_gas: 0, }, 4: ReceiptsStats { num: 0, total_size: 0, total_gas: 0, }, 5: ReceiptsStats { num: 0, total_size: 0, total_gas: 0, }, }, is_outgoing_metadata_ready: { 0: false, 2: false, 3: false, 4: false, 5: false, }, all_outgoing_metadatas_ready: false, }, bandwidth_scheduler: BandwidthSchedulerStats { params: None, prev_bandwidth_requests: {}, prev_bandwidth_requests_num: 0, time_to_run_ms: 0, granted_bandwidth: {}, new_bandwidth_requests: {}, }, balance: BalanceStats { tx_burnt_amount: 4115983319195000000000, slashed_burnt_amount: 0, other_burnt_amount: 0, gas_deficit_amount: 0, }, }, ) ``` </details> The stats are also available in `ChainStore`, making it easy to read them from tests. In the future we could also add an RPC endpoint to make the stats available in `debug-ui`. The PR is divided into commits for easier review.
marcelo-gonzalez
pushed a commit
to marcelo-gonzalez/nearcore
that referenced
this issue
Feb 11, 2025
This is the first step towards per-chunk metrics (near#12758). This PR adds a new struct - `ChunkApplyStats` - which keeps information about things that happened during chunk application. For example how many transactions there were, how many receipts, what were the outgoing limits, how many receipts were forwarded, buffered, etc, etc. For now `ChunkApplyStats` contain mainly data relevant to the bandwidth scheduler, in the future more stats can be added to measure other things that we're interested in. I didn't want to add too much stuff at once to keep the PR size reasonable. There was already a struct called `ApplyStats`, but it was used only for the balance checker. I replaced it with `BalanceStats` inside `ChunkApplyStats`. `ChunkApplyStats` are returned in `ApplyChunkResult` and saved to the database for later use. A new database column is added to keep the chunk application stats. The column is included in the standard garbage collection logic to keep the size of saved data reasonable. Running `neard view-state chunk-apply-stats` allows node operator to view chunk application stats for a given chunk. Example output for a mainnet chunk: <details> <summary> Click to expand </summary> ```rust $ ./neard view-state chunk-apply-stats --block-hash GKzyP7DVNw5ctUcBhRRkABMaC2giNSKK5oHCrRc9hnXH --shard-id 0 ... V0( ChunkApplyStatsV0 { height: 138121896, shard_id: 0, is_chunk_missing: false, transactions_num: 35, incoming_receipts_num: 103, receipt_sink: ReceiptSinkStats { outgoing_limits: { 0: OutgoingLimitStats { size: 102400, gas: 18446744073709551615, }, 1: OutgoingLimitStats { size: 4718592, gas: 300000000000000000, }, 2: OutgoingLimitStats { size: 102400, gas: 300000000000000000, }, 3: OutgoingLimitStats { size: 102400, gas: 300000000000000000, }, 4: OutgoingLimitStats { size: 102400, gas: 300000000000000000, }, 5: OutgoingLimitStats { size: 102400, gas: 300000000000000000, }, }, forwarded_receipts: { 0: ReceiptsStats { num: 24, total_size: 6801, total_gas: 515985143008901, }, 2: ReceiptsStats { num: 21, total_size: 6962, total_gas: 639171080456467, }, 3: ReceiptsStats { num: 58, total_size: 17843, total_gas: 1213382619794847, }, 4: ReceiptsStats { num: 20, total_size: 6278, total_gas: 235098003759589, }, 5: ReceiptsStats { num: 4, total_size: 2089, total_gas: 245101556851946, }, }, buffered_receipts: {}, final_outgoing_buffers: { 0: ReceiptsStats { num: 0, total_size: 0, total_gas: 0, }, 2: ReceiptsStats { num: 0, total_size: 0, total_gas: 0, }, 3: ReceiptsStats { num: 0, total_size: 0, total_gas: 0, }, 4: ReceiptsStats { num: 0, total_size: 0, total_gas: 0, }, 5: ReceiptsStats { num: 0, total_size: 0, total_gas: 0, }, }, is_outgoing_metadata_ready: { 0: false, 2: false, 3: false, 4: false, 5: false, }, all_outgoing_metadatas_ready: false, }, bandwidth_scheduler: BandwidthSchedulerStats { params: None, prev_bandwidth_requests: {}, prev_bandwidth_requests_num: 0, time_to_run_ms: 0, granted_bandwidth: {}, new_bandwidth_requests: {}, }, balance: BalanceStats { tx_burnt_amount: 4115983319195000000000, slashed_burnt_amount: 0, other_burnt_amount: 0, gas_deficit_amount: 0, }, }, ) ``` </details> The stats are also available in `ChainStore`, making it easy to read them from tests. In the future we could also add an RPC endpoint to make the stats available in `debug-ui`. The PR is divided into commits for easier review.
marcelo-gonzalez
pushed a commit
to marcelo-gonzalez/nearcore
that referenced
this issue
Feb 11, 2025
This is the first step towards per-chunk metrics (near#12758). This PR adds a new struct - `ChunkApplyStats` - which keeps information about things that happened during chunk application. For example how many transactions there were, how many receipts, what were the outgoing limits, how many receipts were forwarded, buffered, etc, etc. For now `ChunkApplyStats` contain mainly data relevant to the bandwidth scheduler, in the future more stats can be added to measure other things that we're interested in. I didn't want to add too much stuff at once to keep the PR size reasonable. There was already a struct called `ApplyStats`, but it was used only for the balance checker. I replaced it with `BalanceStats` inside `ChunkApplyStats`. `ChunkApplyStats` are returned in `ApplyChunkResult` and saved to the database for later use. A new database column is added to keep the chunk application stats. The column is included in the standard garbage collection logic to keep the size of saved data reasonable. Running `neard view-state chunk-apply-stats` allows node operator to view chunk application stats for a given chunk. Example output for a mainnet chunk: <details> <summary> Click to expand </summary> ```rust $ ./neard view-state chunk-apply-stats --block-hash GKzyP7DVNw5ctUcBhRRkABMaC2giNSKK5oHCrRc9hnXH --shard-id 0 ... V0( ChunkApplyStatsV0 { height: 138121896, shard_id: 0, is_chunk_missing: false, transactions_num: 35, incoming_receipts_num: 103, receipt_sink: ReceiptSinkStats { outgoing_limits: { 0: OutgoingLimitStats { size: 102400, gas: 18446744073709551615, }, 1: OutgoingLimitStats { size: 4718592, gas: 300000000000000000, }, 2: OutgoingLimitStats { size: 102400, gas: 300000000000000000, }, 3: OutgoingLimitStats { size: 102400, gas: 300000000000000000, }, 4: OutgoingLimitStats { size: 102400, gas: 300000000000000000, }, 5: OutgoingLimitStats { size: 102400, gas: 300000000000000000, }, }, forwarded_receipts: { 0: ReceiptsStats { num: 24, total_size: 6801, total_gas: 515985143008901, }, 2: ReceiptsStats { num: 21, total_size: 6962, total_gas: 639171080456467, }, 3: ReceiptsStats { num: 58, total_size: 17843, total_gas: 1213382619794847, }, 4: ReceiptsStats { num: 20, total_size: 6278, total_gas: 235098003759589, }, 5: ReceiptsStats { num: 4, total_size: 2089, total_gas: 245101556851946, }, }, buffered_receipts: {}, final_outgoing_buffers: { 0: ReceiptsStats { num: 0, total_size: 0, total_gas: 0, }, 2: ReceiptsStats { num: 0, total_size: 0, total_gas: 0, }, 3: ReceiptsStats { num: 0, total_size: 0, total_gas: 0, }, 4: ReceiptsStats { num: 0, total_size: 0, total_gas: 0, }, 5: ReceiptsStats { num: 0, total_size: 0, total_gas: 0, }, }, is_outgoing_metadata_ready: { 0: false, 2: false, 3: false, 4: false, 5: false, }, all_outgoing_metadatas_ready: false, }, bandwidth_scheduler: BandwidthSchedulerStats { params: None, prev_bandwidth_requests: {}, prev_bandwidth_requests_num: 0, time_to_run_ms: 0, granted_bandwidth: {}, new_bandwidth_requests: {}, }, balance: BalanceStats { tx_burnt_amount: 4115983319195000000000, slashed_burnt_amount: 0, other_burnt_amount: 0, gas_deficit_amount: 0, }, }, ) ``` </details> The stats are also available in `ChainStore`, making it easy to read them from tests. In the future we could also add an RPC endpoint to make the stats available in `debug-ui`. The PR is divided into commits for easier review.
marcelo-gonzalez
pushed a commit
to marcelo-gonzalez/nearcore
that referenced
this issue
Feb 11, 2025
This is the first step towards per-chunk metrics (near#12758). This PR adds a new struct - `ChunkApplyStats` - which keeps information about things that happened during chunk application. For example how many transactions there were, how many receipts, what were the outgoing limits, how many receipts were forwarded, buffered, etc, etc. For now `ChunkApplyStats` contain mainly data relevant to the bandwidth scheduler, in the future more stats can be added to measure other things that we're interested in. I didn't want to add too much stuff at once to keep the PR size reasonable. There was already a struct called `ApplyStats`, but it was used only for the balance checker. I replaced it with `BalanceStats` inside `ChunkApplyStats`. `ChunkApplyStats` are returned in `ApplyChunkResult` and saved to the database for later use. A new database column is added to keep the chunk application stats. The column is included in the standard garbage collection logic to keep the size of saved data reasonable. Running `neard view-state chunk-apply-stats` allows node operator to view chunk application stats for a given chunk. Example output for a mainnet chunk: <details> <summary> Click to expand </summary> ```rust $ ./neard view-state chunk-apply-stats --block-hash GKzyP7DVNw5ctUcBhRRkABMaC2giNSKK5oHCrRc9hnXH --shard-id 0 ... V0( ChunkApplyStatsV0 { height: 138121896, shard_id: 0, is_chunk_missing: false, transactions_num: 35, incoming_receipts_num: 103, receipt_sink: ReceiptSinkStats { outgoing_limits: { 0: OutgoingLimitStats { size: 102400, gas: 18446744073709551615, }, 1: OutgoingLimitStats { size: 4718592, gas: 300000000000000000, }, 2: OutgoingLimitStats { size: 102400, gas: 300000000000000000, }, 3: OutgoingLimitStats { size: 102400, gas: 300000000000000000, }, 4: OutgoingLimitStats { size: 102400, gas: 300000000000000000, }, 5: OutgoingLimitStats { size: 102400, gas: 300000000000000000, }, }, forwarded_receipts: { 0: ReceiptsStats { num: 24, total_size: 6801, total_gas: 515985143008901, }, 2: ReceiptsStats { num: 21, total_size: 6962, total_gas: 639171080456467, }, 3: ReceiptsStats { num: 58, total_size: 17843, total_gas: 1213382619794847, }, 4: ReceiptsStats { num: 20, total_size: 6278, total_gas: 235098003759589, }, 5: ReceiptsStats { num: 4, total_size: 2089, total_gas: 245101556851946, }, }, buffered_receipts: {}, final_outgoing_buffers: { 0: ReceiptsStats { num: 0, total_size: 0, total_gas: 0, }, 2: ReceiptsStats { num: 0, total_size: 0, total_gas: 0, }, 3: ReceiptsStats { num: 0, total_size: 0, total_gas: 0, }, 4: ReceiptsStats { num: 0, total_size: 0, total_gas: 0, }, 5: ReceiptsStats { num: 0, total_size: 0, total_gas: 0, }, }, is_outgoing_metadata_ready: { 0: false, 2: false, 3: false, 4: false, 5: false, }, all_outgoing_metadatas_ready: false, }, bandwidth_scheduler: BandwidthSchedulerStats { params: None, prev_bandwidth_requests: {}, prev_bandwidth_requests_num: 0, time_to_run_ms: 0, granted_bandwidth: {}, new_bandwidth_requests: {}, }, balance: BalanceStats { tx_burnt_amount: 4115983319195000000000, slashed_burnt_amount: 0, other_burnt_amount: 0, gas_deficit_amount: 0, }, }, ) ``` </details> The stats are also available in `ChainStore`, making it easy to read them from tests. In the future we could also add an RPC endpoint to make the stats available in `debug-ui`. The PR is divided into commits for easier review.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-observability
Area: observability
C-enhancement
Category: An issue proposing an enhancement or a PR with one.
nearcore
has a lot of metrics which are scraped by prometheus and presented on Grafana dashboards. These metrics provide some insight into what's going on in the binary, but they are not very precise. All metrics are currently aggregated over 1-minute periods, which is around 60 blocks. Aggregating over this many blocks provides high level information, but it doesn't give much insight into individual chunks.It would be great to have detailed metrics for every chunk - how many transactions were processed, how much gas was burned at each stage of chunk application, how many receipts were forwarded, which limits were hit, etc, etc. I would love to be able to use a neard command/debug-ui page to view detailed chunk application metrics about any chosen chunk.
Advantages:
The text was updated successfully, but these errors were encountered: