-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Non-Human-Readable Deserialization for Felt Using Serde #76
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a few tests ?
I added deserialization tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The length of the sequence is always 32? is it enforced somewhere ? if not can you add a test where the length is < 32 and one where it is > 32 ?
I've implemented error handling for sequence lengths that are not exactly 32 bytes during deserialization. To ensure we provide accurate and efficient error messages, I considered two different approaches. I would appreciate your feedback on which approach is better. Approach 1:if let Some(extra) = seq.size_hint() {
if extra > 0 {
return Err(de::Error::invalid_length(32 + extra, &"32 bytes"));
}
}
Approach 2:if seq.next_element::<u8>()?.is_some() {
let mut count = 33;
while seq.next_element::<u8>()?.is_some() {
count += 1;
}
return Err(de::Error::invalid_length(count, &"32 bytes"));
}
Which approach do you think is more suitable for this case? |
why do we want to enforce 32 bytes ? I think we should allow |
if we allow |
What would little endian add here ? |
for (i, byte) in bytes.iter_mut().enumerate() { | ||
*byte = seq | ||
.next_element()? | ||
.ok_or_else(|| de::Error::invalid_length(i, &"32 bytes"))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For extra compactness, can't we accept sequences with fewer elements than 32?
Eg: Felt::ZERO could be serialized as [0]
, rather than [0; 32]
.
And all values under 255 on a single element, and so on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are kinda doing this already with the human-readable form. Felt::ZERO is 0x0
not 0x000000000..0000
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I can see @LucasLvy already suggested this.
I agree with him
Token::U8(0), | ||
Token::U8(0), | ||
Token::Seq { len: Some(2) }, | ||
Token::U8(1 | 0x80), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need this first element containing len and the compressed flag?
can't we just encode Felt::ONE
as
&[
Token::Seq { len: Some(1) },
Token::U8(1),
]
And know that because we are using big endian repr, missing bytes are 0
s and just be happy with it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Serialization can be:
let be_bytes = self.to_bytes_be();
let mut i = match be_bytes.iter().find(|b| b != 0) {
Some(i) => i,
None => { // handle Felt::ZERO and return }
}
let mut seq = serializer.serialize_seq(Some(32 - i))?;
while i < 32 {
seq.serialize_element(be_bytes[i])?;
i += 1;
}
seq.end();
And deserialization:
let mut buffer = [0; 32];
let mut i = 32;
while i > 0 {
if let Some(b) = seq.next_element()? {
buffer[i-1] = b;
} else {
break;
}
i -= 1;
}
if i == 0 && seq.next_element()?.is_some() {
// Return error invalid len
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why wouldn't this way simpler impl work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using seq.next_element()?
to infer lengths and handle missing bytes as zeros isn’t reliable for all serializers. Some serializers may not support this implicit handling, leading to incorrect deserialization. Explicitly encoding the length and compressed flag ensures that the deserializer knows exactly how many bytes to expect and process, making the format self-descriptive and robust across different implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But isn't what we are already doing with Some(32 - i)
?
And how exactly could it fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea here we had with @jbcaron was to make a binary format that never exceeds 32 bytes in the worse case (abitrary hash), and compresses it down to a few bytes in the best cases (small felt)
To do that, we think of the input as a byte stream and we use the felt unused bits in the first byte to encode a flag telling us if the felt is compressed or not
When the felt is compressed we treat that first byte as the length, so we read len
more bytes from the wire -
when it's not compressed, we treat it as the first byte of the hash and complete it by reading 31 mores bytes.
However, we found out that there is an issue: using serde serialize_seq
(or serialize_bytes
), bincode will prepend the length of the sequence before the felt, and we end up with >32 bytes felts :(
This is not bincode's problem though it's fundamental to how serde works
Taking inspiration from fixed-sized arrays implementation in serde, they are serialized using serialize_tuple
, which requires the length to be known beforehand... and we don't know the length before the first byte.
That gave me an idea though - which I tried here https://gist.github.com/cchudant/9bbb025c60a9fe23ca56b895fdcff13c
It works by treating the Felt as a tuple of (u8, <rest>) with <rest> being another tuple that is DeserializeSeed
ed with the length
Nesting tuples like this has no effect on the actual bytes on the wire at the end, so we are free to do that
The resulting encoding never exceeds 32 bytes as intended and compresses stuff correctly
@tdelabro what do you think of that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I understand better.
I will redo my review keeping this in mind
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still, why do you need the flag for?
Get rid of it and if you have 32 elements, it's not compressed. If you have less it is compressed.
The flag doesn't add anything there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need the byte that contain flag+len only when you serialize [Felt].
Then, because the size in bytes of a Felt can change you have to do:
[flag+len, byte, .., byte, flag+len, byte, flag+len, byte, .., byte, ...]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So my recomandation would be:
- impl Serde on Felt the way I described
- Create public method
serialize_sequence_of_felt_packed()
anddeserialize_sequence_of_felt_packed()
that can be called inside#[serde(de/serialize_with = "path")]
and implement your self descriptive packed representation here
This way values that are a single felt will always be between 1 and 32 bytes.
Value that are a sequence of felt will be serialized as:
- by default:
[
SerdeLen, // Number of felt in sequence
[SerdeLen, u8, u8, ...], // felt 1
[SerdeLen, u8, u8, ...], // felt 2
[SerdeLen, u8, u8, ...], // felt 3
]
- using our custom methods:
[
SerdeLen, // Number of felt in sequence
OurLenAndFlagOrUnflaggedFirstByte // First Byte of the first felt
u8,
...,
u8, // Last byte of the first felt
OurLenAndFlagOrUnflaggedFirstByte // First Byte of the second felt
...,
]
This makes you spare one byte in the case of felts that are 32 bytes long and nothing for all the others. Not gonna lie, it's not a huge optimization but may still be worth for some people.
let first_significant_byte = bytes.iter().position(|&b| b != 0).unwrap_or(31); | ||
if first_significant_byte > 1 { | ||
let len = 32 - first_significant_byte; | ||
let mut seq = serializer.serialize_seq(Some(len + 1))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we are already storing the size. We should call serialize_eq
with None
in order to spare the extra byte which contains the len at the beginning.
Otherway there is no use to store the len a second time ourself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes our worst case 33 bytes and our best 3 bytes (value, our len, their len). We want 32 and 2.
let mut seq = serializer.serialize_seq(Some(32))?; | ||
for b in self.to_bytes_be() { | ||
seq.serialize_element(&b)?; | ||
let bytes = self.to_bytes_be(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can avoid the if else nesting of if first_significant_byte > 1
doing the following:
let be_bytes = self.to_bytes_be();
let mut i = match be_bytes.iter().find(|b| b != 0) {
Some(i) => i,
None => { // handle Felt::ZERO and return }
}
let mut seq = serializer.serialize_seq(None)?;
if i != 0 {
seq.serialize_element(32 - i | COMPRESSED)?;
}
while i < 32 {
seq.serialize_element(be_bytes[i])?;
i += 1;
}
seq.end();
Token::U8(0), | ||
Token::U8(0), | ||
Token::Seq { len: Some(2) }, | ||
Token::U8(1 | 0x80), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still, why do you need the flag for?
Get rid of it and if you have 32 elements, it's not compressed. If you have less it is compressed.
The flag doesn't add anything there
Co-authored-by: cchudant <[email protected]>
311747f
to
fd2299f
Compare
Add Non-Human-Readable Deserialization for Felt Using Serde
Pull Request type
What is the current behavior?
Currently, the
Felt
type does not support custom serialization and deserialization forbincode
and other formats that differentiate between human-readable and binary forms.What is the new behavior?
Deserialize
forFelt
to correctly parse hex strings and sequences of 32 bytes, ensuring compatibility withbincode
.Does this introduce a breaking change?
No
Other information