Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-visiting the serialization #214

Open
minghuaw opened this issue Oct 26, 2023 · 6 comments
Open

Re-visiting the serialization #214

minghuaw opened this issue Oct 26, 2023 · 6 comments

Comments

@minghuaw
Copy link
Owner

The current to_vec() method creates the output buffer with Vec::new(), and according to [1]

A new, empty Vec created by the common means (vec![] or Vec::new or Vec::default) has a length and capacity of zero

This would inevitably get to re-allocation and probably repeated re-allocation if the object is large. However, given that we already have a SizeSerializer which can estimate the serialized size in bytes, this could potentially reduce the number of re-allocation.

[1] https://nnethercote.github.io/perf-book/heap-allocations.html?highlight=borrow

@minghuaw minghuaw changed the title Would pre-allocation improve serialization perf? Re-visiting the serialization Oct 26, 2023
@minghuaw
Copy link
Owner Author

In addition, there are places where a temporary buffer is created during the serialization, is it possible to apply a similar technique? Or even better, can these temporary buffers be removed since the reason why most of them are there in the first place was because the serialized format requires a size byte(s) prepended to the actual data.

@minghuaw
Copy link
Owner Author

minghuaw commented Nov 3, 2023

Or even better, can these temporary buffers be removed since the reason why most of them are there in the first place was because the serialized format requires a size byte(s) prepended to the actual data.

It might be better if this is introduced in a breaking update

@minghuaw
Copy link
Owner Author

minghuaw commented Nov 3, 2023

Initial experiment shows that this quite significantly degrades serialization performance for primitive types like u8, bool, i8, and char that are only one or two bytes long. Very big improvement was observed for types that of of medium length (4B to 1kB). Surprisingly, for long strings/binary (>= 1MB), the performance seems to remain the same

@minghuaw
Copy link
Owner Author

minghuaw commented Nov 5, 2023

Or even better, can these temporary buffers be removed since the reason why most of them are there in the first place was because the serialized format requires a size byte(s) prepended to the actual data.

Reserving capacity in buffer somehow negatively impact serializing Vec<u64>

@lsunsi
Copy link

lsunsi commented May 5, 2024

This is interesting, I can't imagine why it would decrease the performance in this way. Just noting here that I'd expect the pre allocation to only improve performance as well.

@minghuaw
Copy link
Owner Author

minghuaw commented May 6, 2024

This is interesting, I can't imagine why it would decrease the performance in this way. Just noting here that I'd expect the pre allocation to only improve performance as well.

That was my expectation as well. I haven't got enough time to investigate further however.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

2 participants