-
Notifications
You must be signed in to change notification settings - Fork 682
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add chunk application stats #12797
Conversation
*block_hash, | ||
shard_uid.shard_id(), | ||
apply_result.stats, | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Saving chunk stats here means that only chunks applied inside blocks will have their stats saved. Stateless chunk validators will not save any stats. In the future we could change it to save it somewhere else, but it's good enough for the first version.
@@ -462,7 +467,8 @@ impl DBCol { | |||
| DBCol::StateHeaders | |||
| DBCol::TransactionResultForBlock | |||
| DBCol::Transactions | |||
| DBCol::StateShardUIdMapping => true, | |||
| DBCol::StateShardUIdMapping | |||
| DBCol::ChunkApplyStats => true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope that marking this column as cold
is enough to avoid garbage collection on archival nodes? I think these stats should be kept forever on archival nodes. They are not that big and it would be nice to be able to view stats for chunks older than three epochs.
/// The stats can be read to analyze what happened during chunk application. | ||
/// - *Rows*: BlockShardId (BlockHash || ShardId) - 40 bytes | ||
/// - *Column type*: `ChunkApplyStats` | ||
ChunkApplyStats, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first I thought that I could use ChunkHash
as a key in the database, but that doesn't really
work. The same chunk can be applied multiple times when there are missing chunks, and I think chunks
created using the same prev_block
would have the same hash (?).
@@ -648,6 +648,7 @@ impl<'a> ChainStoreUpdate<'a> { | |||
self.gc_outgoing_receipts(&block_hash, shard_id); | |||
self.gc_col(DBCol::IncomingReceipts, &block_shard_id); | |||
self.gc_col(DBCol::StateTransitionData, &block_shard_id); | |||
self.gc_col(DBCol::ChunkApplyStats, &block_shard_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we could use some other garbage collection logic to keep the stats for longer than three epochs. Maybe something similar to LatestWitnesses
where the last N witnesses are kept in the database? It's annoying that useful data like these stats disappears after three epochs, especially in tests which have to run for a few epochs. Can be changed later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed that it would be cool to keep those longer and agreed to keep the first version simple.
@@ -336,7 +327,7 @@ impl Runtime { | |||
apply_state: &ApplyState, | |||
signed_transaction: &SignedTransaction, | |||
transaction_cost: &TransactionCost, | |||
stats: &mut ApplyStats, | |||
stats: &mut ChunkApplyStatsV0, | |||
) -> Result<(Receipt, ExecutionOutcomeWithId), InvalidTxError> { | |||
let span = tracing::Span::current(); | |||
metrics::TRANSACTION_PROCESSED_TOTAL.inc(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Runtime metrics could probably be refactored so that first we collect the stats and at the very end
we record all of the stats in the metrics. Would reduce clutter in the runtime code.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #12797 +/- ##
==========================================
+ Coverage 70.39% 70.43% +0.03%
==========================================
Files 851 855 +4
Lines 174187 174827 +640
Branches 174187 174827 +640
==========================================
+ Hits 122626 123146 +520
- Misses 46318 46445 +127
+ Partials 5243 5236 -7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@@ -0,0 +1,218 @@ | |||
use std::collections::BTreeMap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be a part of primitives? Isn't there an obvious conceptual "producer" crate which all dependents use that could hold this type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially put it in node-runtime
, but then I needed the struct in near-store
and that doesn't depend on node-runtime
so I moved the struct to primitives. It's a primitive struct that is used in multiple crates, so that seemed like good fit.
In the future there might be more crates that make use of these stats, maybe a custom aggregator which downloads stats from multiple nodes and aggregate them somehow. It would be nice to have a small crate that the aggregator can import without importing all of runtime.
If there's a better place for it please let me know.
/// Useful for debugging, metrics and sanity checks. | ||
#[derive(Debug, Clone, BorshSerialize, BorshDeserialize)] | ||
pub enum ChunkApplyStats { | ||
V0(ChunkApplyStatsV0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible for us to find a way to avoid versioning headaches with this mostly internal data? I don't think it is going to be painful if we make the old data inaccessible if the schema changes, we should take advantage of that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These stats might be consumed by other services in the future - debug ui, custom stats aggregators, etc, so I wanted to have a (mostly) stable interface that they could depend on. My first thought was to make it versioned, but maybe there's other ways to go about it.
It looks like the problem of disk filling up was caused by an unrelated issue (faulty rocksdb update) combined with insufficient memory on the node. Constant crashes and rocksdb acting up caused too much data to be written to disk. The PR is ready for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -1084,6 +1098,7 @@ pub struct ChainStoreUpdate<'a> { | |||
add_state_sync_infos: Vec<StateSyncInfo>, | |||
remove_state_sync_infos: Vec<CryptoHash>, | |||
challenged_blocks: HashSet<CryptoHash>, | |||
chunk_apply_stats: HashMap<(CryptoHash, ShardId), ChunkApplyStats>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small suggestion: shard uid is the better unique identifier of a shard. that being said it's often not readily available, in that case don't worry about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIU from now on the plan is to add new shard ids instead of increasing UId versions, so it should be unique enough. ShardId is more user friendly so I went with that.
@@ -648,6 +648,7 @@ impl<'a> ChainStoreUpdate<'a> { | |||
self.gc_outgoing_receipts(&block_hash, shard_id); | |||
self.gc_col(DBCol::IncomingReceipts, &block_shard_id); | |||
self.gc_col(DBCol::StateTransitionData, &block_shard_id); | |||
self.gc_col(DBCol::ChunkApplyStats, &block_shard_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed that it would be cool to keep those longer and agreed to keep the first version simple.
@@ -115,6 +116,7 @@ pub struct ApplyChunkResult { | |||
pub bandwidth_scheduler_state_hash: CryptoHash, | |||
/// Contracts accessed and deployed while applying the chunk. | |||
pub contract_updates: ContractUpdates, | |||
pub stats: ChunkApplyStatsV0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the versioned struct instead of the enum?
nit: please add a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the versioned struct instead of the enum?
Using a versioned enum is annoying, you have to unwrap it and wrap it again. It's nicer to use the struct. Versioning is only needed for storage and external interfaces, locally we can use the latest version of the struct.
/// Was the chunk applied as a missing chunk (apply_old_chunk) | ||
pub is_chunk_missing: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps use the same schema as the chunks do - have height_included as a field and a method to check if a chunk is new or old. Not a biggie.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The stats are meant to be human readable and looking at is_chunk_missing
is easier than comparing height_included
, I prefer having a bool.
/// Number of previous bandwidth requests (prev_bandwidth_requests.len()). | ||
pub prev_bandwidth_requests_num: u64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this can be derived can you remove it and add a method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted the stats to be something that a human can view to get an overview of what happened inside the chunk. It's hard to count the individual bandwidth requests so it's useful to have a separate field which shows their number.
I guess a method would be cleaner, but that'd have to be supported in all of the places that will be displaying the stats, adding a field is easier and it's not super ugly.
inner.record_outgoing_buffer_stats(); | ||
*stats = inner.stats; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Maybe just return the stats from inner and set directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This way means one less line in validate_apply_state_update
, I think that's better
@@ -510,6 +533,7 @@ impl ReceiptSinkV2 { | |||
trie: &dyn TrieAccess, | |||
shard_layout: &ShardLayout, | |||
side_effects: bool, | |||
stats: &mut ChunkApplyStatsV0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Maybe the ChunkApplyStats
(without version)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using the V0
struct is easier than wrapping and unwrapping and enum
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean that you'll need to do it anyway once V1 is introduced right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The runtime can always use the latest version of the struct and convert it to the enum at the end, there is no need to deal with enums while collecting stats.
for shard in self.outgoing_buffers.shards() { | ||
let buffer = self.outgoing_buffers.to_shard(shard); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sanity check - Does this add to state witness size at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it doesn't - I was careful to make sure that collecting the stats doesn't read any extra state. All outgoing buffer indices and metadatas are read at the beginning of chunk application and after that using them doesn't read any state.
Proving this is more tricky than I'd like, but I don't see any way around it, at some point we have to read the data. Canary nodes should be able detect any trie read inconsistencies.
} | ||
|
||
match self.outgoing_metadatas.get_metadata_for_shard(&shard) { | ||
Some(metadata) if metadata.total_receipts_num() == buffer.len() => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this if do and why do you need it? Please add a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment
runtime/runtime/src/lib.rs
Outdated
processing_state.stats.transactions_num = | ||
transactions.transactions.len().try_into().unwrap(); | ||
processing_state.stats.incoming_receipts_num = incoming_receipts.len().try_into().unwrap(); | ||
processing_state.stats.is_chunk_missing = !apply_state.is_new_chunk; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: rename is_chunk_missing to is_new_chunk. It's better for both consistency and it's generally good practice to use positive expressions in variable names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_new_chunk
is a bit ambiguous, new
can have many different meanings. IMO missing
is clearer - having a missing chunk
means one thing, while a new chunk
can appear in many different contexts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new chunk is an established term in the codebase and the current convention
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alrigght, changed to is_new_chunk
.insert(shard, ReceiptsStats { num: 0, total_size: 0, total_gas: 0 }); | ||
self.stats.is_outgoing_metadata_ready.insert(shard, true); | ||
continue; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a bug here - without the continue;
stats for shards that never had any receipts sent to them appeared to have uninitialized metadata because the metadata was None
.
I also bumped the db version. Older versions of
|
This is the first step towards per-chunk metrics (near#12758). This PR adds a new struct - `ChunkApplyStats` - which keeps information about things that happened during chunk application. For example how many transactions there were, how many receipts, what were the outgoing limits, how many receipts were forwarded, buffered, etc, etc. For now `ChunkApplyStats` contain mainly data relevant to the bandwidth scheduler, in the future more stats can be added to measure other things that we're interested in. I didn't want to add too much stuff at once to keep the PR size reasonable. There was already a struct called `ApplyStats`, but it was used only for the balance checker. I replaced it with `BalanceStats` inside `ChunkApplyStats`. `ChunkApplyStats` are returned in `ApplyChunkResult` and saved to the database for later use. A new database column is added to keep the chunk application stats. The column is included in the standard garbage collection logic to keep the size of saved data reasonable. Running `neard view-state chunk-apply-stats` allows node operator to view chunk application stats for a given chunk. Example output for a mainnet chunk: <details> <summary> Click to expand </summary> ```rust $ ./neard view-state chunk-apply-stats --block-hash GKzyP7DVNw5ctUcBhRRkABMaC2giNSKK5oHCrRc9hnXH --shard-id 0 ... V0( ChunkApplyStatsV0 { height: 138121896, shard_id: 0, is_chunk_missing: false, transactions_num: 35, incoming_receipts_num: 103, receipt_sink: ReceiptSinkStats { outgoing_limits: { 0: OutgoingLimitStats { size: 102400, gas: 18446744073709551615, }, 1: OutgoingLimitStats { size: 4718592, gas: 300000000000000000, }, 2: OutgoingLimitStats { size: 102400, gas: 300000000000000000, }, 3: OutgoingLimitStats { size: 102400, gas: 300000000000000000, }, 4: OutgoingLimitStats { size: 102400, gas: 300000000000000000, }, 5: OutgoingLimitStats { size: 102400, gas: 300000000000000000, }, }, forwarded_receipts: { 0: ReceiptsStats { num: 24, total_size: 6801, total_gas: 515985143008901, }, 2: ReceiptsStats { num: 21, total_size: 6962, total_gas: 639171080456467, }, 3: ReceiptsStats { num: 58, total_size: 17843, total_gas: 1213382619794847, }, 4: ReceiptsStats { num: 20, total_size: 6278, total_gas: 235098003759589, }, 5: ReceiptsStats { num: 4, total_size: 2089, total_gas: 245101556851946, }, }, buffered_receipts: {}, final_outgoing_buffers: { 0: ReceiptsStats { num: 0, total_size: 0, total_gas: 0, }, 2: ReceiptsStats { num: 0, total_size: 0, total_gas: 0, }, 3: ReceiptsStats { num: 0, total_size: 0, total_gas: 0, }, 4: ReceiptsStats { num: 0, total_size: 0, total_gas: 0, }, 5: ReceiptsStats { num: 0, total_size: 0, total_gas: 0, }, }, is_outgoing_metadata_ready: { 0: false, 2: false, 3: false, 4: false, 5: false, }, all_outgoing_metadatas_ready: false, }, bandwidth_scheduler: BandwidthSchedulerStats { params: None, prev_bandwidth_requests: {}, prev_bandwidth_requests_num: 0, time_to_run_ms: 0, granted_bandwidth: {}, new_bandwidth_requests: {}, }, balance: BalanceStats { tx_burnt_amount: 4115983319195000000000, slashed_burnt_amount: 0, other_burnt_amount: 0, gas_deficit_amount: 0, }, }, ) ``` </details> The stats are also available in `ChainStore`, making it easy to read them from tests. In the future we could also add an RPC endpoint to make the stats available in `debug-ui`. The PR is divided into commits for easier review.
This is the first step towards per-chunk metrics (near#12758). This PR adds a new struct - `ChunkApplyStats` - which keeps information about things that happened during chunk application. For example how many transactions there were, how many receipts, what were the outgoing limits, how many receipts were forwarded, buffered, etc, etc. For now `ChunkApplyStats` contain mainly data relevant to the bandwidth scheduler, in the future more stats can be added to measure other things that we're interested in. I didn't want to add too much stuff at once to keep the PR size reasonable. There was already a struct called `ApplyStats`, but it was used only for the balance checker. I replaced it with `BalanceStats` inside `ChunkApplyStats`. `ChunkApplyStats` are returned in `ApplyChunkResult` and saved to the database for later use. A new database column is added to keep the chunk application stats. The column is included in the standard garbage collection logic to keep the size of saved data reasonable. Running `neard view-state chunk-apply-stats` allows node operator to view chunk application stats for a given chunk. Example output for a mainnet chunk: <details> <summary> Click to expand </summary> ```rust $ ./neard view-state chunk-apply-stats --block-hash GKzyP7DVNw5ctUcBhRRkABMaC2giNSKK5oHCrRc9hnXH --shard-id 0 ... V0( ChunkApplyStatsV0 { height: 138121896, shard_id: 0, is_chunk_missing: false, transactions_num: 35, incoming_receipts_num: 103, receipt_sink: ReceiptSinkStats { outgoing_limits: { 0: OutgoingLimitStats { size: 102400, gas: 18446744073709551615, }, 1: OutgoingLimitStats { size: 4718592, gas: 300000000000000000, }, 2: OutgoingLimitStats { size: 102400, gas: 300000000000000000, }, 3: OutgoingLimitStats { size: 102400, gas: 300000000000000000, }, 4: OutgoingLimitStats { size: 102400, gas: 300000000000000000, }, 5: OutgoingLimitStats { size: 102400, gas: 300000000000000000, }, }, forwarded_receipts: { 0: ReceiptsStats { num: 24, total_size: 6801, total_gas: 515985143008901, }, 2: ReceiptsStats { num: 21, total_size: 6962, total_gas: 639171080456467, }, 3: ReceiptsStats { num: 58, total_size: 17843, total_gas: 1213382619794847, }, 4: ReceiptsStats { num: 20, total_size: 6278, total_gas: 235098003759589, }, 5: ReceiptsStats { num: 4, total_size: 2089, total_gas: 245101556851946, }, }, buffered_receipts: {}, final_outgoing_buffers: { 0: ReceiptsStats { num: 0, total_size: 0, total_gas: 0, }, 2: ReceiptsStats { num: 0, total_size: 0, total_gas: 0, }, 3: ReceiptsStats { num: 0, total_size: 0, total_gas: 0, }, 4: ReceiptsStats { num: 0, total_size: 0, total_gas: 0, }, 5: ReceiptsStats { num: 0, total_size: 0, total_gas: 0, }, }, is_outgoing_metadata_ready: { 0: false, 2: false, 3: false, 4: false, 5: false, }, all_outgoing_metadatas_ready: false, }, bandwidth_scheduler: BandwidthSchedulerStats { params: None, prev_bandwidth_requests: {}, prev_bandwidth_requests_num: 0, time_to_run_ms: 0, granted_bandwidth: {}, new_bandwidth_requests: {}, }, balance: BalanceStats { tx_burnt_amount: 4115983319195000000000, slashed_burnt_amount: 0, other_burnt_amount: 0, gas_deficit_amount: 0, }, }, ) ``` </details> The stats are also available in `ChainStore`, making it easy to read them from tests. In the future we could also add an RPC endpoint to make the stats available in `debug-ui`. The PR is divided into commits for easier review.
This is the first step towards per-chunk metrics (near#12758). This PR adds a new struct - `ChunkApplyStats` - which keeps information about things that happened during chunk application. For example how many transactions there were, how many receipts, what were the outgoing limits, how many receipts were forwarded, buffered, etc, etc. For now `ChunkApplyStats` contain mainly data relevant to the bandwidth scheduler, in the future more stats can be added to measure other things that we're interested in. I didn't want to add too much stuff at once to keep the PR size reasonable. There was already a struct called `ApplyStats`, but it was used only for the balance checker. I replaced it with `BalanceStats` inside `ChunkApplyStats`. `ChunkApplyStats` are returned in `ApplyChunkResult` and saved to the database for later use. A new database column is added to keep the chunk application stats. The column is included in the standard garbage collection logic to keep the size of saved data reasonable. Running `neard view-state chunk-apply-stats` allows node operator to view chunk application stats for a given chunk. Example output for a mainnet chunk: <details> <summary> Click to expand </summary> ```rust $ ./neard view-state chunk-apply-stats --block-hash GKzyP7DVNw5ctUcBhRRkABMaC2giNSKK5oHCrRc9hnXH --shard-id 0 ... V0( ChunkApplyStatsV0 { height: 138121896, shard_id: 0, is_chunk_missing: false, transactions_num: 35, incoming_receipts_num: 103, receipt_sink: ReceiptSinkStats { outgoing_limits: { 0: OutgoingLimitStats { size: 102400, gas: 18446744073709551615, }, 1: OutgoingLimitStats { size: 4718592, gas: 300000000000000000, }, 2: OutgoingLimitStats { size: 102400, gas: 300000000000000000, }, 3: OutgoingLimitStats { size: 102400, gas: 300000000000000000, }, 4: OutgoingLimitStats { size: 102400, gas: 300000000000000000, }, 5: OutgoingLimitStats { size: 102400, gas: 300000000000000000, }, }, forwarded_receipts: { 0: ReceiptsStats { num: 24, total_size: 6801, total_gas: 515985143008901, }, 2: ReceiptsStats { num: 21, total_size: 6962, total_gas: 639171080456467, }, 3: ReceiptsStats { num: 58, total_size: 17843, total_gas: 1213382619794847, }, 4: ReceiptsStats { num: 20, total_size: 6278, total_gas: 235098003759589, }, 5: ReceiptsStats { num: 4, total_size: 2089, total_gas: 245101556851946, }, }, buffered_receipts: {}, final_outgoing_buffers: { 0: ReceiptsStats { num: 0, total_size: 0, total_gas: 0, }, 2: ReceiptsStats { num: 0, total_size: 0, total_gas: 0, }, 3: ReceiptsStats { num: 0, total_size: 0, total_gas: 0, }, 4: ReceiptsStats { num: 0, total_size: 0, total_gas: 0, }, 5: ReceiptsStats { num: 0, total_size: 0, total_gas: 0, }, }, is_outgoing_metadata_ready: { 0: false, 2: false, 3: false, 4: false, 5: false, }, all_outgoing_metadatas_ready: false, }, bandwidth_scheduler: BandwidthSchedulerStats { params: None, prev_bandwidth_requests: {}, prev_bandwidth_requests_num: 0, time_to_run_ms: 0, granted_bandwidth: {}, new_bandwidth_requests: {}, }, balance: BalanceStats { tx_burnt_amount: 4115983319195000000000, slashed_burnt_amount: 0, other_burnt_amount: 0, gas_deficit_amount: 0, }, }, ) ``` </details> The stats are also available in `ChainStore`, making it easy to read them from tests. In the future we could also add an RPC endpoint to make the stats available in `debug-ui`. The PR is divided into commits for easier review.
This is the first step towards per-chunk metrics (near#12758). This PR adds a new struct - `ChunkApplyStats` - which keeps information about things that happened during chunk application. For example how many transactions there were, how many receipts, what were the outgoing limits, how many receipts were forwarded, buffered, etc, etc. For now `ChunkApplyStats` contain mainly data relevant to the bandwidth scheduler, in the future more stats can be added to measure other things that we're interested in. I didn't want to add too much stuff at once to keep the PR size reasonable. There was already a struct called `ApplyStats`, but it was used only for the balance checker. I replaced it with `BalanceStats` inside `ChunkApplyStats`. `ChunkApplyStats` are returned in `ApplyChunkResult` and saved to the database for later use. A new database column is added to keep the chunk application stats. The column is included in the standard garbage collection logic to keep the size of saved data reasonable. Running `neard view-state chunk-apply-stats` allows node operator to view chunk application stats for a given chunk. Example output for a mainnet chunk: <details> <summary> Click to expand </summary> ```rust $ ./neard view-state chunk-apply-stats --block-hash GKzyP7DVNw5ctUcBhRRkABMaC2giNSKK5oHCrRc9hnXH --shard-id 0 ... V0( ChunkApplyStatsV0 { height: 138121896, shard_id: 0, is_chunk_missing: false, transactions_num: 35, incoming_receipts_num: 103, receipt_sink: ReceiptSinkStats { outgoing_limits: { 0: OutgoingLimitStats { size: 102400, gas: 18446744073709551615, }, 1: OutgoingLimitStats { size: 4718592, gas: 300000000000000000, }, 2: OutgoingLimitStats { size: 102400, gas: 300000000000000000, }, 3: OutgoingLimitStats { size: 102400, gas: 300000000000000000, }, 4: OutgoingLimitStats { size: 102400, gas: 300000000000000000, }, 5: OutgoingLimitStats { size: 102400, gas: 300000000000000000, }, }, forwarded_receipts: { 0: ReceiptsStats { num: 24, total_size: 6801, total_gas: 515985143008901, }, 2: ReceiptsStats { num: 21, total_size: 6962, total_gas: 639171080456467, }, 3: ReceiptsStats { num: 58, total_size: 17843, total_gas: 1213382619794847, }, 4: ReceiptsStats { num: 20, total_size: 6278, total_gas: 235098003759589, }, 5: ReceiptsStats { num: 4, total_size: 2089, total_gas: 245101556851946, }, }, buffered_receipts: {}, final_outgoing_buffers: { 0: ReceiptsStats { num: 0, total_size: 0, total_gas: 0, }, 2: ReceiptsStats { num: 0, total_size: 0, total_gas: 0, }, 3: ReceiptsStats { num: 0, total_size: 0, total_gas: 0, }, 4: ReceiptsStats { num: 0, total_size: 0, total_gas: 0, }, 5: ReceiptsStats { num: 0, total_size: 0, total_gas: 0, }, }, is_outgoing_metadata_ready: { 0: false, 2: false, 3: false, 4: false, 5: false, }, all_outgoing_metadatas_ready: false, }, bandwidth_scheduler: BandwidthSchedulerStats { params: None, prev_bandwidth_requests: {}, prev_bandwidth_requests_num: 0, time_to_run_ms: 0, granted_bandwidth: {}, new_bandwidth_requests: {}, }, balance: BalanceStats { tx_burnt_amount: 4115983319195000000000, slashed_burnt_amount: 0, other_burnt_amount: 0, gas_deficit_amount: 0, }, }, ) ``` </details> The stats are also available in `ChainStore`, making it easy to read them from tests. In the future we could also add an RPC endpoint to make the stats available in `debug-ui`. The PR is divided into commits for easier review.
This is the first step towards per-chunk metrics (near#12758). This PR adds a new struct - `ChunkApplyStats` - which keeps information about things that happened during chunk application. For example how many transactions there were, how many receipts, what were the outgoing limits, how many receipts were forwarded, buffered, etc, etc. For now `ChunkApplyStats` contain mainly data relevant to the bandwidth scheduler, in the future more stats can be added to measure other things that we're interested in. I didn't want to add too much stuff at once to keep the PR size reasonable. There was already a struct called `ApplyStats`, but it was used only for the balance checker. I replaced it with `BalanceStats` inside `ChunkApplyStats`. `ChunkApplyStats` are returned in `ApplyChunkResult` and saved to the database for later use. A new database column is added to keep the chunk application stats. The column is included in the standard garbage collection logic to keep the size of saved data reasonable. Running `neard view-state chunk-apply-stats` allows node operator to view chunk application stats for a given chunk. Example output for a mainnet chunk: <details> <summary> Click to expand </summary> ```rust $ ./neard view-state chunk-apply-stats --block-hash GKzyP7DVNw5ctUcBhRRkABMaC2giNSKK5oHCrRc9hnXH --shard-id 0 ... V0( ChunkApplyStatsV0 { height: 138121896, shard_id: 0, is_chunk_missing: false, transactions_num: 35, incoming_receipts_num: 103, receipt_sink: ReceiptSinkStats { outgoing_limits: { 0: OutgoingLimitStats { size: 102400, gas: 18446744073709551615, }, 1: OutgoingLimitStats { size: 4718592, gas: 300000000000000000, }, 2: OutgoingLimitStats { size: 102400, gas: 300000000000000000, }, 3: OutgoingLimitStats { size: 102400, gas: 300000000000000000, }, 4: OutgoingLimitStats { size: 102400, gas: 300000000000000000, }, 5: OutgoingLimitStats { size: 102400, gas: 300000000000000000, }, }, forwarded_receipts: { 0: ReceiptsStats { num: 24, total_size: 6801, total_gas: 515985143008901, }, 2: ReceiptsStats { num: 21, total_size: 6962, total_gas: 639171080456467, }, 3: ReceiptsStats { num: 58, total_size: 17843, total_gas: 1213382619794847, }, 4: ReceiptsStats { num: 20, total_size: 6278, total_gas: 235098003759589, }, 5: ReceiptsStats { num: 4, total_size: 2089, total_gas: 245101556851946, }, }, buffered_receipts: {}, final_outgoing_buffers: { 0: ReceiptsStats { num: 0, total_size: 0, total_gas: 0, }, 2: ReceiptsStats { num: 0, total_size: 0, total_gas: 0, }, 3: ReceiptsStats { num: 0, total_size: 0, total_gas: 0, }, 4: ReceiptsStats { num: 0, total_size: 0, total_gas: 0, }, 5: ReceiptsStats { num: 0, total_size: 0, total_gas: 0, }, }, is_outgoing_metadata_ready: { 0: false, 2: false, 3: false, 4: false, 5: false, }, all_outgoing_metadatas_ready: false, }, bandwidth_scheduler: BandwidthSchedulerStats { params: None, prev_bandwidth_requests: {}, prev_bandwidth_requests_num: 0, time_to_run_ms: 0, granted_bandwidth: {}, new_bandwidth_requests: {}, }, balance: BalanceStats { tx_burnt_amount: 4115983319195000000000, slashed_burnt_amount: 0, other_burnt_amount: 0, gas_deficit_amount: 0, }, }, ) ``` </details> The stats are also available in `ChainStore`, making it easy to read them from tests. In the future we could also add an RPC endpoint to make the stats available in `debug-ui`. The PR is divided into commits for easier review.
This is the first step towards per-chunk metrics (#12758).
This PR adds a new struct -
ChunkApplyStats
- which keeps information about things that happenedduring chunk application. For example how many transactions there were, how many receipts, what were
the outgoing limits, how many receipts were forwarded, buffered, etc, etc.
For now
ChunkApplyStats
contain mainly data relevant to the bandwidth scheduler, in the futuremore stats can be added to measure other things that we're interested in. I didn't want to add too
much stuff at once to keep the PR size reasonable.
There was already a struct called
ApplyStats
, but it was used only for the balance checker. Ireplaced it with
BalanceStats
insideChunkApplyStats
.ChunkApplyStats
are returned inApplyChunkResult
and saved to the database for later use. A newdatabase column is added to keep the chunk application stats. The column is included in the standard
garbage collection logic to keep the size of saved data reasonable.
Running
neard view-state chunk-apply-stats
allows node operator to view chunk application statsfor a given chunk. Example output for a mainnet chunk:
Click to expand
The stats are also available in
ChainStore
, making it easy to read them from tests.In the future we could also add an RPC endpoint to make the stats available in
debug-ui
.The PR is divided into commits for easier review.