Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deserialize branch #374

Merged
merged 3 commits into from
Nov 29, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
99 changes: 7 additions & 92 deletions firewood/src/merkle/node.rs
Original file line number Diff line number Diff line change
Expand Up @@ -340,13 +340,17 @@ use type_id::NodeTypeId;

impl Storable for Node {
fn deserialize<T: CachedStore>(addr: usize, mem: &T) -> Result<Self, ShaleError> {
let mut offset = addr;

let meta_raw =
mem.get_view(addr, Meta::SIZE as u64)
mem.get_view(offset, Meta::SIZE as u64)
.ok_or(ShaleError::InvalidCacheView {
offset: addr,
offset,
size: Meta::SIZE as u64,
})?;

offset += Meta::SIZE;

let attrs = NodeAttributes::from_bits_retain(meta_raw.as_deref()[TRIE_HASH_LEN]);

let root_hash = if attrs.contains(NodeAttributes::ROOT_HASH_VALID) {
Expand All @@ -368,96 +372,7 @@ impl Storable for Node {

match meta_raw.as_deref()[TRIE_HASH_LEN + 1].try_into()? {
NodeTypeId::Branch => {
// TODO: add path
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This TODO seems relevant still, but it was deleted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will go away shortly

// TODO: figure out what this size is?
let branch_header_size = BranchNode::MAX_CHILDREN as u64 * 8 + 4;
let node_raw = mem.get_view(addr + Meta::SIZE, branch_header_size).ok_or(
ShaleError::InvalidCacheView {
offset: addr + Meta::SIZE,
size: branch_header_size,
},
)?;

let mut cur = Cursor::new(node_raw.as_deref());
let mut chd = [None; BranchNode::MAX_CHILDREN];
let mut buff = [0; 8];

for chd in chd.iter_mut() {
cur.read_exact(&mut buff)?;
let addr = usize::from_le_bytes(buff);
if addr != 0 {
*chd = Some(DiskAddress::from(addr))
}
}

cur.read_exact(&mut buff[..4])?;

let raw_len = u32::from_le_bytes(buff[..4].try_into().expect("invalid slice"));

let value = if raw_len == u32::MAX {
None
} else {
let raw_len = raw_len as u64;

Some(Data(
mem.get_view(addr + Meta::SIZE + branch_header_size as usize, raw_len)
.ok_or(ShaleError::InvalidCacheView {
offset: addr + Meta::SIZE + branch_header_size as usize,
size: raw_len,
})?
.as_deref(),
))
};

let mut chd_encoded: [Option<Vec<u8>>; BranchNode::MAX_CHILDREN] =
Default::default();

let offset = if raw_len == u32::MAX {
addr + Meta::SIZE + branch_header_size as usize
} else {
addr + Meta::SIZE + branch_header_size as usize + raw_len as usize
};

let mut cur_encoded_len = 0;

for chd_encoded in chd_encoded.iter_mut() {
let mut buff = [0_u8; 1];
let len_raw = mem.get_view(offset + cur_encoded_len, 1).ok_or(
ShaleError::InvalidCacheView {
offset: offset + cur_encoded_len,
size: 1,
},
)?;

cur = Cursor::new(len_raw.as_deref());
cur.read_exact(&mut buff)?;

let len = buff[0] as u64;
cur_encoded_len += 1;

if len != 0 {
let encoded_raw = mem.get_view(offset + cur_encoded_len, len).ok_or(
ShaleError::InvalidCacheView {
offset: offset + cur_encoded_len,
size: len,
},
)?;

let encoded: Vec<u8> = encoded_raw.as_deref()[0..].to_vec();
*chd_encoded = Some(encoded);
cur_encoded_len += len as usize
}
}

let inner = NodeType::Branch(
BranchNode {
// path: vec![].into(),
children: chd,
value,
children_encoded: chd_encoded,
}
.into(),
);
let inner = NodeType::Branch(Box::new(BranchNode::deserialize(offset, mem)?));

Ok(Self::new_from_hash(
root_hash,
Expand Down
103 changes: 94 additions & 9 deletions firewood/src/merkle/node/branch.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,13 @@
use super::{Data, Encoded, Node};
use crate::{
merkle::{PartialPath, TRIE_HASH_LEN},
shale::ShaleStore,
shale::{DiskAddress, Storable},
shale::{ShaleError, ShaleStore},
};
use bincode::{Error, Options};
use std::{
fmt::{Debug, Error as FmtError, Formatter},
io::{Cursor, Write},
io::{Cursor, Read, Write},
mem::size_of,
ops::Deref,
};
Expand Down Expand Up @@ -232,13 +232,98 @@ impl Storable for BranchNode {
}

fn deserialize<T: crate::shale::CachedStore>(
_addr: usize,
_mem: &T,
) -> Result<Self, crate::shale::ShaleError>
where
Self: Sized,
{
todo!()
mut addr: usize,
mem: &T,
) -> Result<Self, crate::shale::ShaleError> {
const DATA_LEN_SIZE: usize = size_of::<DataLen>();
const BRANCH_HEADER_SIZE: u64 =
BranchNode::MAX_CHILDREN as u64 * DiskAddress::MSIZE + DATA_LEN_SIZE as u64;

let node_raw =
mem.get_view(addr, BRANCH_HEADER_SIZE)
.ok_or(ShaleError::InvalidCacheView {
offset: addr,
size: BRANCH_HEADER_SIZE,
})?;

addr += BRANCH_HEADER_SIZE as usize;

let mut cursor = Cursor::new(node_raw.as_deref());
let mut children = [None; BranchNode::MAX_CHILDREN];
let mut buf = [0u8; DiskAddress::MSIZE as usize];

for child in &mut children {
cursor.read_exact(&mut buf)?;
*child = Some(usize::from_le_bytes(buf))
.filter(|addr| *addr != 0)
.map(DiskAddress::from);
}

let raw_len = {
let mut buf = [0; DATA_LEN_SIZE];
cursor.read_exact(&mut buf)?;
Copy link
Collaborator

@rkuris rkuris Nov 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (DRY): identical code could be refactored into a macro, maybe:

macro_rules! read_data {
    ($cursor:expr, $type:ty) => {{
        let mut buf = [0u8; std::mem::size_of::<$type>()];
        $cursor.read_exact(&mut buf)?;
        <$type>::from_le_bytes(buf)
    }};
}

Then, at the callsite, you can do all the manipulation:

            *child = Some(read_data!(cursor, usize))
                .filter(|addr| *addr != 0)
                .map(DiskAddress::from);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some(DataLen::from_le_bytes(buf))
.filter(|len| *len != DataLen::MAX)
.map(|len| len as u64)
};

let value = match raw_len {
Some(len) => {
let data = mem
.get_view(addr, len)
.ok_or(ShaleError::InvalidCacheView {
offset: addr,
size: len,
})?;

addr += len as usize;

Some(Data(data.as_deref()))
}
None => None,
};

let mut children_encoded: [Option<Vec<u8>>; BranchNode::MAX_CHILDREN] = Default::default();

for child in &mut children_encoded {
const ENCODED_CHILD_LEN_SIZE: u64 = size_of::<EncodedChildLen>() as u64;

let len = mem
.get_view(addr, ENCODED_CHILD_LEN_SIZE)
.ok_or(ShaleError::InvalidCacheView {
offset: addr,
size: ENCODED_CHILD_LEN_SIZE,
})?
.as_deref()[0] as u64;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this will get a clippy exception due to unhandled panic once we add the clippy rules. Maybe:

Suggested change
.as_deref()[0] as u64;
.as_deref().get(0).ok_or(...)? as u64;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 this could also be solved with a macro. We should be casting to an array and using from_be_bytes such that we can make the size any valid numeric type. I'll add a comment to #396

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


addr += ENCODED_CHILD_LEN_SIZE as usize;

if len == 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: clearer if this and the prior statement were reversed. Perhaps increase addr before setting and testing len.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To go along with your comment about the macro, every time we get a view, we should automatically advance the address. I created #396 to address this

continue;
}

let encoded = mem
.get_view(addr, len)
.ok_or(ShaleError::InvalidCacheView {
offset: addr,
size: len,
})?
.as_deref();

addr += len as usize;

*child = Some(encoded);
}

let node = BranchNode {
// TODO: add path
// path: Vec::new().into(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Is this comment wrong? This assumes that all branch nodes have no data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will go away shortly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this line can be removed with the TODO above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will go away shortly

children,
value,
children_encoded,
};

Ok(node)
}
}

Expand Down