Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the 'static requirement from the BinRead Vec implementation by delegating to a new trait method. #317

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

kitlith
Copy link
Collaborator

@kitlith kitlith commented Feb 18, 2025

Fixes #298.

The core new API here is BinRead::read_options_count, where the implementation for Vec<T> will delegate to T::read_options_count. While I was implementing it, I noticed an inconsistency with the count_with helper, which can be fixed separately in another PR if this approach is rejected (or is taking a long time to solidify).

While I was at it, I decided to try applying the optimization to arrays, and got what is likely a functional result, but I'm not a big fan of it. Are there any tweaks we can make to the read_options_count signature to make the array implementation nicer? Do we want to keep the optimization for arrays, or do we want to keep it as it currently is?

Alternatives

Secondary Trait

Use another trait for this (bikeshed: BinReadMulti) which has the same default impl as read_options_count, and which the derive macro automatically implements using the default impl. i.e.

pub trait BinReadMulti: BinRead {
    fn read_options_count(/* ... */) -> /* ... */ { /*...*/ }
}

struct A;
// derive macro:
impl BinReadMulti for A {}

Alongside with an attribute to disable the automatic trait generation, would allow someone to use the derive macro and manually write the multiple item implementation. I don't this this will happen very often, though.

Use dtolnay's typeid crate

https://github.com/dtolnay/typeid allows for obtaining a typeid without 'static. However, it would require us to write some unsafe code within this crate, or get downcast implemented in that crate. (There's also a safety requirement that I'm a little dubious on, but we probably woudn't trip over for this library.

Status Quo: Wait for specialization

Whether specialization or something like this is preferred depends on what type you consider the optimization to be a property of, I guess. Is "you can read integers fast" a property of the integer, or the container(s) of that integer? This PR takes the former viewpoint: we can read u8 fast as a property of u8, not as a property of Vec<u8>, although Vec<u8> benefits as a result.

Test is written to "what a user would expect", this currently fails.
We should either move the optimization to the count helper, or fix
it in some other way.
Fixes jam1garner#298

This implementation chooses to delegate reading a known number of
items to a new method `read_options_count` with a default,
naive implementation. The `Vec<T>` impl and the `count` helper
delegate to `T::read_options_count`.

The `count_with` helper is de-optimized into a naive loop,
for correctness' sake.
(This fixes the failing test introduced in the previous commit.)

One nit remains: Arrays don't have this optimization applied.
Not a big fan of this implementation. Not sure that there's a way
to proceed without compromising on ergonomics either.

Let's see if anyone else has ideas on how to proceed.
@kitlith kitlith added the enhancement New feature or request label Feb 18, 2025
@kitlith kitlith self-assigned this Feb 18, 2025
Copy link
Owner

@jam1garner jam1garner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Solid solution to that issue, code all looks good after a couple looks over it

@kitlith
Copy link
Collaborator Author

kitlith commented Feb 18, 2025

aaaand i forgot to run tests after adding the array impl. why doesn't rust like this. sigh

@jam1garner
Copy link
Owner

jam1garner commented Feb 18, 2025

I'm assuming the tests in CI were failing before this PR? they look mostly unrelated at quick glance

Nm kinda just looks like you broke some no_std stuff

@kitlith
Copy link
Collaborator Author

kitlith commented Feb 18, 2025

AH. Yeah no_std won't like that Vec sneaking into the core implementation either. That's... ugh, i'm not sure what the best way to handle that is, but if we figure it out it probably makes the array impls nicer as well.

@kitlith
Copy link
Collaborator Author

kitlith commented Feb 18, 2025

nightly has slightly different output for a couple ui tests. go figure. everything's fine on latest stable though, and I'm trying to get CI to tell me what 1.66 looks like, but it's looking fine there too.

in other news, i have potentially over complicated the implementation, but it handles both arrays and vecs nicely.

There's plenty of names here to bikeshed (i have no particular attachment to any of the current names), and proper documentation for the container trait still needs to be written.

@kitlith kitlith requested a review from jam1garner February 18, 2025 08:38
Copy link
Collaborator Author

@kitlith kitlith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of TODOs, a bit of commentary. Oops, I missed a spot for u8.

@@ -0,0 +1,121 @@
//! container module
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: everything in here needs proper documentation.

Comment on lines +586 to +591
// pub(crate) fn not_enough_bytes<T>(_: T) -> Error {
// Error::Io(io::Error::new(
// io::ErrorKind::UnexpectedEof,
// "not enough bytes in reader",
// ))
// }
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: either remove this or start using it again.

Comment on lines +48 to +62
C::new_smart(count, |mut buf| {
reader.read_exact(&mut bytemuck::cast_slice_mut::<_, u8>(
&mut buf,
))?;

if core::mem::size_of::<Self>() != 1
&& ((cfg!(target_endian = "big") && endian == crate::Endian::Little)
|| (cfg!(target_endian = "little") && endian == crate::Endian::Big))
{
for value in buf.iter_mut() {
*value = value.swap_bytes();
}
}
Ok(())
})
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was previously done in two passes: incrementally read the whole buffer, then byteswap the whole buffer.

The current construction reads each portion of the buffer, swaps the bytes of that portion, then reads the next portion, etc. Does this make a meaningful performance difference, and should we consider splitting into a "read phase" and a "finalize" phase?

A finalize phase seems ripe for misuse, though. would it be possible to prevent someone from ignoring the read phase and instead doing the read during the finalization phase? I suppose a by-value map would do it.

C::new_smart(
    count,
    |mut buf| {
        reader.read_exact(&mut bytemuck::cast_slice_mut::<_, u8>(&mut buf))?;
        Ok(())
    },
    |value| {
        if core::mem::size_of::<Self>() != 1
            && ((cfg!(target_endian = "big") && endian == crate::Endian::Little)
                || (cfg!(target_endian = "little") && endian == crate::Endian::Big))
        {
            value.swap_bytes()
        } else {
            value
        }
    },
)

i feel like this is haring off further in the direction of over-complication.

use alloc::vec::Vec;

/// Container
pub trait Container: Sized + IntoIterator {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This IntoIterator bound isn't technically necessary. There was an earlier iteration where I was playing around with IntoIterator and FromIterator until I realized that FromIterator doesn't play nicely with arrays, and it was nice to reuse the Item associated type.

Comment on lines +29 to +37
// whee it's a Functor
/// Type Constructor
type HigherSelf<T>: Container<Count = Self::Count, HigherSelf<Self::Item> = Self>
+ IntoIterator<Item = T>;

/// map
fn map<Fun, T>(self, f: Fun) -> Self::HigherSelf<T>
where
Fun: FnMut(Self::Item) -> T;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, having to invent an ecosystem around a new trait so that people can do things that are maybe useful.

This uses the Functor pattern from https://varkor.github.io/blog/2019/03/28/idiomatic-monads-in-rust.html (specifically, "attempt 2", before they start inventing (generic) associated traits to try to express further patterns).

Goal here was to allow users to delegate to other implementations, then map the values into another container "of the same form." I'm effectively trying to keep an eye out for if we ever support not having alloc around: parsing arrays will only ever use arrays, and not alloc::vec::Vec

Comment on lines +100 to +117
// This extra impl for `u8` makes it faster than
// `binread_impl`, but *only* because `binread_impl` is not allowed
// to use unsafe code to eliminate the unnecessary zero-fill.
// Otherwise, performance would be identical and it could be
// deleted.
fn read_options_count<'a, R, C>(
reader: &mut R,
_endian: Endian,
_args: Self::Args<'a>,
count: C::Count,
) -> BinResult<C>
where
R: Read + Seek,
Self::Args<'a>: Clone,
C: crate::container::Container<Item = Self>,
{
C::new_smart(count, |buf| reader.read_exact(buf).map_err(Into::into))
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. This comment was copied over and is inaccurate now because I wasn't paying close enough attention to how I was implementing this version of the specialization. (Notably, we can only get this version of the specialization when using Vec, and I've completely abstracted access to that Vec away here. This is a performance regression.

Ideally we'd be using the current implementation for arrays, and read_to_end for Vec. Funnily enough, we might be able to use fake specialization that requires 'static here to switch between the two implementations, since we only need to care about u8 (and anything that might implement Container... should we be sealing Container?) instead of .

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, no, that doesn't work. That would require us to bound C: Container<Item=Self> + 'static, and if user generated items are not static, the Container wrapping those items probably won't be static either.

Comment on lines +19 to +27
/// smart
///
/// # Errors
///
/// If `f` returns an error, the error will be returned
fn new_smart<Fun, Error>(count: Self::Count, f: Fun) -> Result<Self, Error>
where
Fun: FnMut(&mut [Self::Item]) -> Result<(), Error>,
Self::Item: Default + Clone;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If features read_buf and core_io_borrowed_buf were stable, passing a BorrowedCursor to the closure instead of &mut [Self::Item], and using Read::read_buf_exact would almost get us to not needing to initialize our buffers. But, with the current state of that API, we'd have no way to own the filled values (without unsafe) -- we would only get a reference to a slice of the filled values. (This doesn't mean that such an API won't eventually be added, only that it doesn't currently exist.)

Copy link
Collaborator

@csnover csnover left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned in a line-review I’m not feeling confident about exposing this as a public API, but I’m not opposed to this approach if everyone else finds it agreeable enough. My biggest concrete concern is how count_with runs through Vec and how that seems like it will be adding unnecessary memory overhead that didn’t exist before.

I feel like we are running in circles where optimisations started in Vec, then a generic helper function was created to read into any collection, then optimisation for contiguous primitives was moved into the helper, and now optimisation is getting pushed back out into types again. Are we just doomed to keep making differently suboptimal trade-offs based on whatever seems most important right now until specialisation? Since this is all about code performance I would be biased toward just allowing unsafe things, since that would also allow to eliminate the overhead from using resize.

/// }
/// }
/// ```
fn read_options_count<'a, R, C>(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name: read_collection?

I’m not feeling confident that this should be part of the public API if this is really all just to find a better way to handle specialisation of contiguous primitive types. There are already two ways to read into collections (newtypes and helper functions), this adds a third one, and if with directive gets implemented then there will be four. What does the decision tree look like about when to use each of these different ways? What about the write-API?

Something seems weird to me that this change means read_options becomes effectively an optimisation of read_options_count(r, e, a, 1).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: being a public API: This is also a way for people who have a contiguous slice of bytemuck/zerocopy types to add the optimization themselves (unless we want to add that to binrw uner feature flags), and that would be a less contrived example of the feature.

Overall, I chose the approach that I did due to a combination of making the main library less special (only one to be able to define specializations like this), not require the use of unsafe, and (if i've pulled things off right) being semver compatible.

That said, this design isn't perfect. I don't like that I'm exposing Collection as a public API. That means that users may expect more than bare functionality out of it. (If they care at all. If they don't, then exposing this as public did nothing.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something seems weird to me that this change means read_options becomes effectively an optimisation of read_options_count(r, e, a, 1).

Standard library is in a similar bind with io::Read::read_buf and io::Read::read. They can be implemented in terms of eachother, io::Read::read existed first, and it is more optimal to implement io::Read::read_buf.

They've decided as a prerequsite that they'll need a way to tell the compiler that you only need to implement one or the other. If that ever stabilizes, we can do the same.

// `binread_impl`, but *only* because `binread_impl` is not allowed
// to use unsafe code to eliminate the unnecessary zero-fill.
// Otherwise, performance would be identical and it could be
// deleted.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not sure this comment makes sense any more since had been contextualised by surrounding code which no longer exists?

@@ -10,7 +10,9 @@ use core::num::{
};

macro_rules! binread_impl {
($($type_name:ty),*$(,)?) => {
// `$(some_lit $(__unused $is_lit_present:tt))?` allows us to match on the present of a literal
// using `$($($is_list_present)? fn do_whatever() {})?`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not sure the benefits outweigh the costs of using a weirdo macro pattern like this instead of just using a second macro and sending the extra function in a tt (binread_impl + binread_impl_int)?

{
count_with(n, T::read_options)
move |reader, endian, arg| {
let res: Vec<_> = T::read_options_count(reader, endian, arg, n)?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this will use N*2 memory now in all cases, or will the optimiser elide the Vec-Iter-Vec loop so it will only use N*2 memory for non-Vec return types?

I’m concerned that the utility of this helper function is diminished to the point of uselessness by the uncontrollable memory overhead since one of the main purposes having FromIterator as the return type is to avoid this kind of memory overhead. Otherwise, couldn’t someone just map a Vec?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The optimizer appears to be able to catch it at -Copt-level=2. I'm not 100% sure what it's doing at -Copt-level=1. So, it can catch it (and will catch it in release mode), but it's somewhat non-trivial.

It might be possible to turn this into an API that uses iterators with a fixed size overhead. (reads into fixed-size chunks, then provides each item one at a time.) It may also be possible to add a backdoor into the iterator to preserve the current behavior for Vec (chunks that grow along with the backing memory storage. i.e. exponentially)

I think with such an API we can also say "good riddance" to the Collection trait. Let's see if I can workshop this out.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I didn't care about being able to backdoor the current reading into vec behavior, this would be trivial. Trying to retain the ability to do so makes it really tough. It would be possible with type alias impl trait in trait (i think), which isn't stable yet.

}

use vec_fast_int;
// pub(crate) fn not_enough_bytes<T>(_: T) -> Error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please delete commented out code :-)

use alloc::vec::Vec;

/// Container
pub trait Container: Sized + IntoIterator {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make sure I understand, this trait is just to be able to support both Vec<T> and [T; N]?

Is there some other collection type that would be supportable in the future?

Should this be sealed?

Should this be hidden?

+ IntoIterator<Item = T>;

/// map
fn map<Fun, T>(self, f: Fun) -> Self::HigherSelf<T>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this part of this API (beyond the example in the read_options_count docs, which seems pretty contrived to me)? Is this also the only reason why this trait requires IntoIterator?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part of the API very much directly exists because of that example: because when I went to write it, I reached for this pattern and realized that if i wanted to use that pattern that I would need to add support for it.

type Count = ();
type HigherSelf<X> = [X; N];

fn new_naive<Fun, Error>(_count: (), mut f: Fun) -> Result<Self, Error>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use Self::Count instead of the literal type (throughout)?

/// # Errors
///
/// If `f` returns an error, the error will be returned
fn new_smart<Fun, Error>(count: Self::Count, f: Fun) -> Result<Self, Error>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name: new_contiguous? new_fast?

/// # Errors
///
/// If `f` returns an error, the error will be returned
fn new_naive<Fun, Error>(count: Self::Count, f: Fun) -> Result<Self, Error>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name: new? new_slow?

If the ‘smart’ constructor was filling a MaybeUninit so that Default constraint was not required, would it still be necessary for this to exist at all?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmmm... I suppose, technically, it wouldn't, because we could just iterate over the slice provided by read_smart and perform reads one by one, and that would be morally equivalent, even if the underlying strategies are different.

@csnover
Copy link
Collaborator

csnover commented Feb 19, 2025

Sorry, I started this review yesterday and didn’t see the self-review so some of my questions are redundant :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Issue with lifetimes and Vec (lifetime may not live long enough requires that 'bar must outlive 'static)
3 participants