Data validation on EVM-based chains #67

denisbsu · 2024-09-30T09:13:14Z

There is a number of data validators we should have in eth data ingestion system. Some of them are already implemented.

Primary data consistency:

Is it even a chain? block.parentHash == prev_block.hash
Is it even a block? block.hash == hash(block)
Are transactions intact? block.transactionsRoot == MPT(block.transactions).root
Are receipts intact? block.receiptsRoot == MPT(block.receipts).root
Is state transition correct?

Derivative data correctness:

Is "from" transaction field valid? transaction.from == recover_sender(transaction)
Are traces correct?
Are state diffs correct?

Data internal consistency:

Is block bloom correct? block.bloomFilter == OR(block.receipts.bloomFilter)

During first pass we should identify Primary data fields not covered by consistency checks and Derivative data fields not covered by correctness checks in order to formulate tasks for second pass. Some internal consistency checks may be added due customer requests and this checks, if failed, may cause some data rewrites.

Some validations are straightforward and fully described by asserts in text, but others are not so obvious and would be described here in details.

How to prove State Transition validity and State Diffs validity

State transition and state diffs correctness goes hand in hand as we can prove both simultaneously: we just need to build total state diff for a block (accounting for transactions order) and then request Merkle Proofs for all befores and afters from data provider. We can build two partial MPT's (before and after), roots of this tries should be state roots for previous and current blocks respectively, and all differences between them should be explained by total state diff. This check is very involved and takes a lot of time to write properly, but it relives us from supporting full node and calculating full state MPT for each block.

How to prove Traces validity

Traces are even more involved. Traces can be reproduced by running parts of geth (or any other EVM implementations), but with a twist - we need not only Merkle Proofs of written data, but read data too. So here are validation steps:

Parse traces to figure out what data was read
Get Merkle Proofs of read data from API
Merge Read Proofs with "before" subtrie from State Transitions/State Diffs validation
Run all transaction, using partial trie as initial state
Compare generated trace against API
(Future) To make ZK proof from this validator, , we would need to check resulted state against "after" trie form State Transitions/State Diffs

dzhelezov assigned tmcgroul Sep 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data validation on EVM-based chains #67

Data validation on EVM-based chains #67

denisbsu commented Sep 30, 2024 •

edited

Loading

Data validation on EVM-based chains #67

Data validation on EVM-based chains #67

Comments

denisbsu commented Sep 30, 2024 • edited Loading

How to prove State Transition validity and State Diffs validity

How to prove Traces validity

denisbsu commented Sep 30, 2024 •

edited

Loading