Remove the 'static requirement from the BinRead Vec implementation by delegating to a new trait method. #317

kitlith · 2025-02-18T02:52:29Z

Fixes #298.

The core new API here is BinRead::read_options_count, where the implementation for Vec<T> will delegate to T::read_options_count. While I was implementing it, I noticed an inconsistency with the count_with helper, which can be fixed separately in another PR if this approach is rejected (or is taking a long time to solidify).

While I was at it, I decided to try applying the optimization to arrays, and got what is likely a functional result, but I'm not a big fan of it. Are there any tweaks we can make to the read_options_count signature to make the array implementation nicer? Do we want to keep the optimization for arrays, or do we want to keep it as it currently is?

Alternatives

Secondary Trait

Use another trait for this (bikeshed: BinReadMulti) which has the same default impl as read_options_count, and which the derive macro automatically implements using the default impl. i.e.

pub trait BinReadMulti: BinRead {
    fn read_options_count(/* ... */) -> /* ... */ { /*...*/ }
}

struct A;
// derive macro:
impl BinReadMulti for A {}

Alongside with an attribute to disable the automatic trait generation, would allow someone to use the derive macro and manually write the multiple item implementation. I don't this this will happen very often, though.

Use dtolnay's typeid crate

https://github.com/dtolnay/typeid allows for obtaining a typeid without 'static. However, it would require us to write some unsafe code within this crate, or get downcast implemented in that crate. (There's also a safety requirement that I'm a little dubious on, but we probably woudn't trip over for this library.

Status Quo: Wait for specialization

Whether specialization or something like this is preferred depends on what type you consider the optimization to be a property of, I guess. Is "you can read integers fast" a property of the integer, or the container(s) of that integer? This PR takes the former viewpoint: we can read u8 fast as a property of u8, not as a property of Vec<u8>, although Vec<u8> benefits as a result.

Test is written to "what a user would expect", this currently fails. We should either move the optimization to the count helper, or fix it in some other way.

Fixes jam1garner#298 This implementation chooses to delegate reading a known number of items to a new method `read_options_count` with a default, naive implementation. The `Vec<T>` impl and the `count` helper delegate to `T::read_options_count`. The `count_with` helper is de-optimized into a naive loop, for correctness' sake. (This fixes the failing test introduced in the previous commit.) One nit remains: Arrays don't have this optimization applied.

Not a big fan of this implementation. Not sure that there's a way to proceed without compromising on ergonomics either. Let's see if anyone else has ideas on how to proceed.

jam1garner

This looks great. Solid solution to that issue, code all looks good after a couple looks over it

binrw/src/binread/impls.rs

kitlith · 2025-02-18T03:10:09Z

aaaand i forgot to run tests after adding the array impl. why doesn't rust like this. sigh

jam1garner · 2025-02-18T03:11:36Z

~~I'm assuming the tests in CI were failing before this PR? they look mostly unrelated at quick glance~~

Nm kinda just looks like you broke some no_std stuff

kitlith · 2025-02-18T03:24:11Z

AH. Yeah no_std won't like that Vec sneaking into the core implementation either. That's... ugh, i'm not sure what the best way to handle that is, but if we figure it out it probably makes the array impls nicer as well.

Oops, the new Container trait is also a Functor. Still TODO: add documentation to the new Container stuffs.

kitlith · 2025-02-18T07:53:55Z

nightly has slightly different output for a couple ui tests. go figure. everything's fine on latest stable though, and I'm trying to get CI to tell me what 1.66 looks like, but it's looking fine there too.

in other news, i have potentially over complicated the implementation, but it handles both arrays and vecs nicely.

There's plenty of names here to bikeshed (i have no particular attachment to any of the current names), and proper documentation for the container trait still needs to be written.

kitlith

Couple of TODOs, a bit of commentary. Oops, I missed a spot for u8.

kitlith · 2025-02-18T20:41:29Z

binrw/src/container.rs

@@ -0,0 +1,121 @@
+//! container module


TODO: everything in here needs proper documentation.

kitlith · 2025-02-18T20:42:44Z

binrw/src/helpers.rs

+// pub(crate) fn not_enough_bytes<T>(_: T) -> Error {
+//     Error::Io(io::Error::new(
+//         io::ErrorKind::UnexpectedEof,
+//         "not enough bytes in reader",
+//     ))
+// }


TODO: either remove this or start using it again.

kitlith · 2025-02-18T21:02:35Z

binrw/src/binread/impls.rs

+                        C::new_smart(count, |mut buf| {
+                            reader.read_exact(&mut bytemuck::cast_slice_mut::<_, u8>(
+                                &mut buf,
+                            ))?;
+
+                            if core::mem::size_of::<Self>() != 1
+                                && ((cfg!(target_endian = "big") && endian == crate::Endian::Little)
+                                    || (cfg!(target_endian = "little") && endian == crate::Endian::Big))
+                            {
+                                for value in buf.iter_mut() {
+                                    *value = value.swap_bytes();
+                                }
+                            }
+                            Ok(())
+                        })


This was previously done in two passes: incrementally read the whole buffer, then byteswap the whole buffer.

The current construction reads each portion of the buffer, swaps the bytes of that portion, then reads the next portion, etc. Does this make a meaningful performance difference, and should we consider splitting into a "read phase" and a "finalize" phase?

A finalize phase seems ripe for misuse, though. would it be possible to prevent someone from ignoring the read phase and instead doing the read during the finalization phase? I suppose a by-value map would do it.

C::new_smart( count, |mut buf| { reader.read_exact(&mut bytemuck::cast_slice_mut::<_, u8>(&mut buf))?; Ok(()) }, |value| { if core::mem::size_of::<Self>() != 1 && ((cfg!(target_endian = "big") && endian == crate::Endian::Little) || (cfg!(target_endian = "little") && endian == crate::Endian::Big)) { value.swap_bytes() } else { value } }, )

i feel like this is haring off further in the direction of over-complication.

kitlith · 2025-02-19T00:31:03Z

binrw/src/container.rs

+use alloc::vec::Vec;
+
+/// Container
+pub trait Container: Sized + IntoIterator {


This IntoIterator bound isn't technically necessary. There was an earlier iteration where I was playing around with IntoIterator and FromIterator until I realized that FromIterator doesn't play nicely with arrays, and it was nice to reuse the Item associated type.

kitlith · 2025-02-19T00:37:43Z

binrw/src/container.rs

+    // whee it's a Functor
+    /// Type Constructor
+    type HigherSelf<T>: Container<Count = Self::Count, HigherSelf<Self::Item> = Self>
+        + IntoIterator<Item = T>;
+
+    /// map
+    fn map<Fun, T>(self, f: Fun) -> Self::HigherSelf<T>
+    where
+        Fun: FnMut(Self::Item) -> T;


Hello, having to invent an ecosystem around a new trait so that people can do things that are maybe useful.

This uses the Functor pattern from https://varkor.github.io/blog/2019/03/28/idiomatic-monads-in-rust.html (specifically, "attempt 2", before they start inventing (generic) associated traits to try to express further patterns).

Goal here was to allow users to delegate to other implementations, then map the values into another container "of the same form." I'm effectively trying to keep an eye out for if we ever support not having alloc around: parsing arrays will only ever use arrays, and not alloc::vec::Vec

kitlith · 2025-02-19T01:57:30Z

binrw/src/binread/impls.rs

+    // This extra impl for `u8` makes it faster than
+    // `binread_impl`, but *only* because `binread_impl` is not allowed
+    // to use unsafe code to eliminate the unnecessary zero-fill.
+    // Otherwise, performance would be identical and it could be
+    // deleted.
+    fn read_options_count<'a, R, C>(
+        reader: &mut R,
+        _endian: Endian,
+        _args: Self::Args<'a>,
+        count: C::Count,
+    ) -> BinResult<C>
+    where
+        R: Read + Seek,
+        Self::Args<'a>: Clone,
+        C: crate::container::Container<Item = Self>,
+    {
+        C::new_smart(count, |buf| reader.read_exact(buf).map_err(Into::into))
+    }


Oops. This comment was copied over and is inaccurate now because I wasn't paying close enough attention to how I was implementing this version of the specialization. (Notably, we can only get this version of the specialization when using Vec, and I've completely abstracted access to that Vec away here. This is a performance regression.

Ideally we'd be using the current implementation for arrays, and read_to_end for Vec. Funnily enough, we might be able to use fake specialization that requires 'static here to switch between the two implementations, since we only need to care about u8 (and anything that might implement Container... should we be sealing Container?) instead of .

Actually, no, that doesn't work. That would require us to bound C: Container<Item=Self> + 'static, and if user generated items are not static, the Container wrapping those items probably won't be static either.

kitlith · 2025-02-19T01:57:43Z

binrw/src/container.rs

+    /// smart
+    ///
+    /// # Errors
+    ///
+    /// If `f` returns an error, the error will be returned
+    fn new_smart<Fun, Error>(count: Self::Count, f: Fun) -> Result<Self, Error>
+    where
+        Fun: FnMut(&mut [Self::Item]) -> Result<(), Error>,
+        Self::Item: Default + Clone;


If features read_buf and core_io_borrowed_buf were stable, passing a BorrowedCursor to the closure instead of &mut [Self::Item], and using Read::read_buf_exact would almost get us to not needing to initialize our buffers. But, with the current state of that API, we'd have no way to own the filled values (without unsafe) -- we would only get a reference to a slice of the filled values. (This doesn't mean that such an API won't eventually be added, only that it doesn't currently exist.)

csnover

As I mentioned in a line-review I’m not feeling confident about exposing this as a public API, but I’m not opposed to this approach if everyone else finds it agreeable enough. My biggest concrete concern is how count_with runs through Vec and how that seems like it will be adding unnecessary memory overhead that didn’t exist before.

I feel like we are running in circles where optimisations started in Vec, then a generic helper function was created to read into any collection, then optimisation for contiguous primitives was moved into the helper, and now optimisation is getting pushed back out into types again. Are we just doomed to keep making differently suboptimal trade-offs based on whatever seems most important right now until specialisation? Since this is all about code performance I would be biased toward just allowing unsafe things, since that would also allow to eliminate the overhead from using resize.

csnover · 2025-02-18T17:46:11Z

binrw/src/binread/mod.rs

+    ///     }
+    /// }
+    /// ```
+    fn read_options_count<'a, R, C>(


Name: read_collection?

I’m not feeling confident that this should be part of the public API if this is really all just to find a better way to handle specialisation of contiguous primitive types. There are already two ways to read into collections (newtypes and helper functions), this adds a third one, and if with directive gets implemented then there will be four. What does the decision tree look like about when to use each of these different ways? What about the write-API?

Something seems weird to me that this change means read_options becomes effectively an optimisation of read_options_count(r, e, a, 1).

Re: being a public API: This is also a way for people who have a contiguous slice of bytemuck/zerocopy types to add the optimization themselves (unless we want to add that to binrw uner feature flags), and that would be a less contrived example of the feature.

Overall, I chose the approach that I did due to a combination of making the main library less special (only one to be able to define specializations like this), not require the use of unsafe, and (if i've pulled things off right) being semver compatible.

That said, this design isn't perfect. I don't like that I'm exposing Collection as a public API. That means that users may expect more than bare functionality out of it. (If they care at all. If they don't, then exposing this as public did nothing.)

Something seems weird to me that this change means read_options becomes effectively an optimisation of read_options_count(r, e, a, 1).

Standard library is in a similar bind with io::Read::read_buf and io::Read::read. They can be implemented in terms of eachother, io::Read::read existed first, and it is more optimal to implement io::Read::read_buf.

They've decided as a prerequsite that they'll need a way to tell the compiler that you only need to implement one or the other. If that ever stabilizes, we can do the same.

csnover · 2025-02-18T22:08:44Z

binrw/src/binread/impls.rs

+    // `binread_impl`, but *only* because `binread_impl` is not allowed
+    // to use unsafe code to eliminate the unnecessary zero-fill.
+    // Otherwise, performance would be identical and it could be
+    // deleted.


I’m not sure this comment makes sense any more since had been contextualised by surrounding code which no longer exists?

csnover · 2025-02-18T22:36:39Z

binrw/src/binread/impls.rs

@@ -10,7 +10,9 @@ use core::num::{
 };

 macro_rules! binread_impl {
-    ($($type_name:ty),*$(,)?) => {
+    // `$(some_lit $(__unused $is_lit_present:tt))?` allows us to match on the present of a literal
+    // using `$($($is_list_present)? fn do_whatever() {})?`


I’m not sure the benefits outweigh the costs of using a weirdo macro pattern like this instead of just using a second macro and sending the extra function in a tt (binread_impl + binread_impl_int)?

csnover · 2025-02-18T22:51:54Z

binrw/src/helpers.rs

 {
-    count_with(n, T::read_options)
+    move |reader, endian, arg| {
+        let res: Vec<_> = T::read_options_count(reader, endian, arg, n)?;


So this will use N*2 memory now in all cases, or will the optimiser elide the Vec-Iter-Vec loop so it will only use N*2 memory for non-Vec return types?

I’m concerned that the utility of this helper function is diminished to the point of uselessness by the uncontrollable memory overhead since one of the main purposes having FromIterator as the return type is to avoid this kind of memory overhead. Otherwise, couldn’t someone just map a Vec?

The optimizer appears to be able to catch it at -Copt-level=2. I'm not 100% sure what it's doing at -Copt-level=1. So, it can catch it (and will catch it in release mode), but it's somewhat non-trivial.

It might be possible to turn this into an API that uses iterators with a fixed size overhead. (reads into fixed-size chunks, then provides each item one at a time.) It may also be possible to add a backdoor into the iterator to preserve the current behavior for Vec (chunks that grow along with the backing memory storage. i.e. exponentially)

I think with such an API we can also say "good riddance" to the Collection trait. Let's see if I can workshop this out.

If I didn't care about being able to backdoor the current reading into vec behavior, this would be trivial. Trying to retain the ability to do so makes it really tough. It would be possible with type alias impl trait in trait (i think), which isn't stable yet.

csnover · 2025-02-18T23:11:34Z

binrw/src/helpers.rs

-}
-
-use vec_fast_int;
+// pub(crate) fn not_enough_bytes<T>(_: T) -> Error {


Please delete commented out code :-)

csnover · 2025-02-18T23:22:12Z

binrw/src/container.rs

+use alloc::vec::Vec;
+
+/// Container
+pub trait Container: Sized + IntoIterator {


To make sure I understand, this trait is just to be able to support both Vec<T> and [T; N]?

Is there some other collection type that would be supportable in the future?

Should this be sealed?

Should this be hidden?

csnover · 2025-02-18T23:58:56Z

binrw/src/container.rs

+        + IntoIterator<Item = T>;
+
+    /// map
+    fn map<Fun, T>(self, f: Fun) -> Self::HigherSelf<T>


Why is this part of this API (beyond the example in the read_options_count docs, which seems pretty contrived to me)? Is this also the only reason why this trait requires IntoIterator?

This part of the API very much directly exists because of that example: because when I went to write it, I reached for this pattern and realized that if i wanted to use that pattern that I would need to add support for it.

csnover · 2025-02-18T23:59:37Z

binrw/src/container.rs

+    type Count = ();
+    type HigherSelf<X> = [X; N];
+
+    fn new_naive<Fun, Error>(_count: (), mut f: Fun) -> Result<Self, Error>


Use Self::Count instead of the literal type (throughout)?

csnover · 2025-02-19T04:32:11Z

binrw/src/container.rs

+    /// # Errors
+    ///
+    /// If `f` returns an error, the error will be returned
+    fn new_smart<Fun, Error>(count: Self::Count, f: Fun) -> Result<Self, Error>


Name: new_contiguous? new_fast?

csnover · 2025-02-19T04:32:19Z

binrw/src/container.rs

+    /// # Errors
+    ///
+    /// If `f` returns an error, the error will be returned
+    fn new_naive<Fun, Error>(count: Self::Count, f: Fun) -> Result<Self, Error>


Name: new? new_slow?

If the ‘smart’ constructor was filling a MaybeUninit so that Default constraint was not required, would it still be necessary for this to exist at all?

hmmmm... I suppose, technically, it wouldn't, because we could just iterate over the slice provided by read_smart and perform reads one by one, and that would be morally equivalent, even if the underlying strategies are different.

csnover · 2025-02-19T06:14:59Z

Sorry, I started this review yesterday and didn’t see the self-review so some of my questions are redundant :-)

kitlith added 3 commits February 17, 2025 17:40

Test count_with helper for usage of read function on primitive

400345a

Test is written to "what a user would expect", this currently fails. We should either move the optimization to the count helper, or fix it in some other way.

WIP: Apply "read multiple items" optimization to arrays.

ca149c7

Not a big fan of this implementation. Not sure that there's a way to proceed without compromising on ergonomics either. Let's see if anyone else has ideas on how to proceed.

kitlith added the enhancement New feature or request label Feb 18, 2025

kitlith requested review from csnover and jam1garner February 18, 2025 02:52

kitlith self-assigned this Feb 18, 2025

jam1garner approved these changes Feb 18, 2025

View reviewed changes

binrw/src/binread/impls.rs Outdated Show resolved Hide resolved

WIP: Use array_init::from_iter to move from Vec into array

4317a3a

kitlith force-pushed the no_static_vec branch from 07d220b to 4317a3a Compare February 18, 2025 03:33

kitlith added 4 commits February 17, 2025 23:34

Make read_options_count return Vec *or* array

7c3c96b

Oops, the new Container trait is also a Functor. Still TODO: add documentation to the new Container stuffs.

fixup! add Error sections to placeholder documentation

5e854ad

fixup! oops the test has build warnings

d57269f

fixup! missing doc link after removing "unused" import

97a6d5b

kitlith requested a review from jam1garner February 18, 2025 08:38

kitlith commented Feb 19, 2025

View reviewed changes

csnover reviewed Feb 19, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove the 'static requirement from the BinRead Vec implementation by delegating to a new trait method. #317

Remove the 'static requirement from the BinRead Vec implementation by delegating to a new trait method. #317

kitlith commented Feb 18, 2025 •

edited

Loading

jam1garner left a comment

kitlith commented Feb 18, 2025

jam1garner commented Feb 18, 2025 •

edited

Loading

kitlith commented Feb 18, 2025 •

edited

Loading

kitlith commented Feb 18, 2025 •

edited

Loading

kitlith left a comment

kitlith Feb 18, 2025

kitlith Feb 18, 2025

kitlith Feb 18, 2025

kitlith Feb 19, 2025

kitlith Feb 19, 2025

kitlith Feb 19, 2025

kitlith Feb 19, 2025

kitlith Feb 19, 2025

csnover left a comment

csnover Feb 18, 2025

kitlith Feb 19, 2025

kitlith Feb 19, 2025

csnover Feb 18, 2025

csnover Feb 18, 2025

csnover Feb 18, 2025

kitlith Feb 19, 2025

kitlith Feb 20, 2025

csnover Feb 18, 2025

csnover Feb 18, 2025

csnover Feb 18, 2025

kitlith Feb 19, 2025

csnover Feb 18, 2025

csnover Feb 19, 2025

csnover Feb 19, 2025

kitlith Feb 19, 2025

csnover commented Feb 19, 2025

Remove the 'static requirement from the BinRead Vec implementation by delegating to a new trait method. #317

Are you sure you want to change the base?

Remove the 'static requirement from the BinRead Vec implementation by delegating to a new trait method. #317

Conversation

kitlith commented Feb 18, 2025 • edited Loading

Alternatives

Secondary Trait

Use dtolnay's typeid crate

Status Quo: Wait for specialization

jam1garner left a comment

Choose a reason for hiding this comment

kitlith commented Feb 18, 2025

jam1garner commented Feb 18, 2025 • edited Loading

kitlith commented Feb 18, 2025 • edited Loading

kitlith commented Feb 18, 2025 • edited Loading

kitlith left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

csnover left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

csnover commented Feb 19, 2025

kitlith commented Feb 18, 2025 •

edited

Loading

jam1garner commented Feb 18, 2025 •

edited

Loading

kitlith commented Feb 18, 2025 •

edited

Loading

kitlith commented Feb 18, 2025 •

edited

Loading