-
Notifications
You must be signed in to change notification settings - Fork 801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Block availability data enum #6866
base: unstable
Are you sure you want to change the base?
Conversation
beacon_node/beacon_chain/src/data_availability_checker/overflow_lru_cache.rs
Outdated
Show resolved
Hide resolved
let num_received_blobs = self.num_received_blobs(); | ||
let num_received_columns = self.num_received_data_columns(); | ||
if spec.is_peer_das_enabled_for_epoch(block.epoch()) { | ||
if self.verified_data_columns.len() >= custody_column_count { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've discussed about this before - we don't allow for duplicates when adding columns to the available cache, so if len()
is greater than custody column count, we likely have added non-custody column to the cache, and this would result in:
- inconsistent columns in DB
- incorrectly identifying a block as available where it isn't (getting some non custody columns, but custody columns may be missing)
With the current approach, we must be able to detect if there's a bug here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a TODO in the line below to fix this case later. We were not handling this case before and it's out of scope for this PR IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in the other comment, our code on unstable
right now requires an exact match for either blobs or columns, why change this?
"received_columns" => num_received_columns, | ||
"expected_columns" => expected_columns_msg, | ||
); | ||
// TODO(das) get only the columns that we need, filter the rest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this filter necessary with the current approach?
Is this TODO
to account for peer sampling?
Feels like it's inconsistent in the context of subnet sampling though - because we could have enough verified_data_columns
but end up storing less than custody_column_count
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what to do in this case. Erroring here is bad as it will prevent block import. This should not happen and points to an internal error. I would rather think how to handle the >
cases in a separate PR.
beacon_node/beacon_chain/src/data_availability_checker/overflow_lru_cache.rs
Show resolved
Hide resolved
} else { | ||
// Before PeerDAS, blobs | ||
let num_received_blobs = self.verified_blobs.iter().flatten().count(); | ||
if num_received_blobs >= num_expected_blobs { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's definitely a bug if >
happens, I'm not sure if we just make it available and store the the "uncommitted blobs"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the previous handling is good - and if there's unexpected blob there should be an error AvailabilityCheckError::Unexpected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before we did not handle the >
case, we had a .take()
on the blobs iter, we would just ignore the extra blobs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The take()
is unnecessary I believe, because we check for duplicates AND exact match before this PR
lighthouse/beacon_node/beacon_chain/src/data_availability_checker/overflow_lru_cache.rs
Lines 240 to 244 in a1b7d61
let all_blobs_received = block_kzg_commitments_count_opt | |
.is_some_and(|num_expected_blobs| num_expected_blobs == num_received_blobs); | |
let all_columns_received = expected_columns_opt | |
.is_some_and(|num_expected_columns| num_expected_columns == num_received_columns); |
"block_root" => %block_root, | ||
"count" => blobs.len(), | ||
); | ||
Ok(Some(StoreOp::PutBlobs(block_root, blobs))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we keep the is_empty
check for both blobs and columns so we dont't store empty values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AvailableBlockData::Blobs
enum by definition must have more than zero blobs. If we do a check here we should error, instead of ignoring the Op
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, very elegant solution, and much cleaner.
I've added some comments, the main questions I have are the DA checker is_available
checks and edge case handling.
22692ee
to
5358dbb
Compare
This pull request has merge conflicts. Could you please resolve them @dapplion? 🙏 |
.unwrap_or("unknown".to_string()); | ||
/// Returns Some if the block has received all its required data for import. The return value | ||
/// must be persisted in the DB along with the block. | ||
pub fn is_available( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit on convention: the name is_available
implies an immutable method that does not mutate self
and return a bool - although I can't think of a better name.
Issue Addressed
PeerDAS has undergone multiple refactors + the blending with the get_blobs optimization has generated technical debt.
A function signature like this
lighthouse/beacon_node/beacon_chain/src/beacon_chain.rs
Lines 7171 to 7178 in f008b84
Allows at least the following combination of states:
In reality, we don't have that many possible states, only:
NoData
: pre-deneb, pre-PeerDAS with 0 blobs or post-PeerDAS with 0 blobsBlobs(BlobSidecarList<E>)
: post-Deneb pre-PeerDAS with > 0 blobsDataColumns(DataColumnSidecarList<E>)
: post-PeerDAS with > 0 blobsDataColumnsRecv(oneshot::Receiver<DataColumnSidecarList<E>>)
: post-PeerDAS with > 0 blobs, but we obtained the columns via reconstruction^ this are the variants of the new
AvailableBlockData
enumSo we go from 2^5 states to 4 well-defined. Downstream code benefits nicely from this clarity and I think it makes the whole feature much more maintainable.
Currently
is_available
returns a bool, and then we construct the available block inmake_available
. In a way the availability condition is duplicated in both functions. Instead, this PR constructsAvailableBlockData
inis_available
so the availability conditions are written once