-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove the 'static requirement from the BinRead Vec implementation by delegating to a new trait method. #317
base: master
Are you sure you want to change the base?
Conversation
Test is written to "what a user would expect", this currently fails. We should either move the optimization to the count helper, or fix it in some other way.
Fixes jam1garner#298 This implementation chooses to delegate reading a known number of items to a new method `read_options_count` with a default, naive implementation. The `Vec<T>` impl and the `count` helper delegate to `T::read_options_count`. The `count_with` helper is de-optimized into a naive loop, for correctness' sake. (This fixes the failing test introduced in the previous commit.) One nit remains: Arrays don't have this optimization applied.
Not a big fan of this implementation. Not sure that there's a way to proceed without compromising on ergonomics either. Let's see if anyone else has ideas on how to proceed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. Solid solution to that issue, code all looks good after a couple looks over it
aaaand i forgot to run tests after adding the array impl. why doesn't rust like this. sigh |
Nm kinda just looks like you broke some no_std stuff |
AH. Yeah no_std won't like that Vec sneaking into the core implementation either. That's... ugh, i'm not sure what the best way to handle that is, but if we figure it out it probably makes the array impls nicer as well. |
07d220b
to
4317a3a
Compare
Oops, the new Container trait is also a Functor. Still TODO: add documentation to the new Container stuffs.
nightly has slightly different output for a couple ui tests. go figure. everything's fine on latest stable though, and I'm trying to get CI to tell me what 1.66 looks like, but it's looking fine there too. in other news, i have potentially over complicated the implementation, but it handles both arrays and vecs nicely. There's plenty of names here to bikeshed (i have no particular attachment to any of the current names), and proper documentation for the container trait still needs to be written. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of TODOs, a bit of commentary. Oops, I missed a spot for u8.
@@ -0,0 +1,121 @@ | |||
//! container module |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: everything in here needs proper documentation.
// pub(crate) fn not_enough_bytes<T>(_: T) -> Error { | ||
// Error::Io(io::Error::new( | ||
// io::ErrorKind::UnexpectedEof, | ||
// "not enough bytes in reader", | ||
// )) | ||
// } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: either remove this or start using it again.
C::new_smart(count, |mut buf| { | ||
reader.read_exact(&mut bytemuck::cast_slice_mut::<_, u8>( | ||
&mut buf, | ||
))?; | ||
|
||
if core::mem::size_of::<Self>() != 1 | ||
&& ((cfg!(target_endian = "big") && endian == crate::Endian::Little) | ||
|| (cfg!(target_endian = "little") && endian == crate::Endian::Big)) | ||
{ | ||
for value in buf.iter_mut() { | ||
*value = value.swap_bytes(); | ||
} | ||
} | ||
Ok(()) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was previously done in two passes: incrementally read the whole buffer, then byteswap the whole buffer.
The current construction reads each portion of the buffer, swaps the bytes of that portion, then reads the next portion, etc. Does this make a meaningful performance difference, and should we consider splitting into a "read phase" and a "finalize" phase?
A finalize phase seems ripe for misuse, though. would it be possible to prevent someone from ignoring the read phase and instead doing the read during the finalization phase? I suppose a by-value map would do it.
C::new_smart(
count,
|mut buf| {
reader.read_exact(&mut bytemuck::cast_slice_mut::<_, u8>(&mut buf))?;
Ok(())
},
|value| {
if core::mem::size_of::<Self>() != 1
&& ((cfg!(target_endian = "big") && endian == crate::Endian::Little)
|| (cfg!(target_endian = "little") && endian == crate::Endian::Big))
{
value.swap_bytes()
} else {
value
}
},
)
i feel like this is haring off further in the direction of over-complication.
use alloc::vec::Vec; | ||
|
||
/// Container | ||
pub trait Container: Sized + IntoIterator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This IntoIterator bound isn't technically necessary. There was an earlier iteration where I was playing around with IntoIterator and FromIterator until I realized that FromIterator doesn't play nicely with arrays, and it was nice to reuse the Item
associated type.
// whee it's a Functor | ||
/// Type Constructor | ||
type HigherSelf<T>: Container<Count = Self::Count, HigherSelf<Self::Item> = Self> | ||
+ IntoIterator<Item = T>; | ||
|
||
/// map | ||
fn map<Fun, T>(self, f: Fun) -> Self::HigherSelf<T> | ||
where | ||
Fun: FnMut(Self::Item) -> T; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello, having to invent an ecosystem around a new trait so that people can do things that are maybe useful.
This uses the Functor pattern from https://varkor.github.io/blog/2019/03/28/idiomatic-monads-in-rust.html (specifically, "attempt 2", before they start inventing (generic) associated traits to try to express further patterns).
Goal here was to allow users to delegate to other implementations, then map the values into another container "of the same form." I'm effectively trying to keep an eye out for if we ever support not having alloc around: parsing arrays will only ever use arrays, and not alloc::vec::Vec
// This extra impl for `u8` makes it faster than | ||
// `binread_impl`, but *only* because `binread_impl` is not allowed | ||
// to use unsafe code to eliminate the unnecessary zero-fill. | ||
// Otherwise, performance would be identical and it could be | ||
// deleted. | ||
fn read_options_count<'a, R, C>( | ||
reader: &mut R, | ||
_endian: Endian, | ||
_args: Self::Args<'a>, | ||
count: C::Count, | ||
) -> BinResult<C> | ||
where | ||
R: Read + Seek, | ||
Self::Args<'a>: Clone, | ||
C: crate::container::Container<Item = Self>, | ||
{ | ||
C::new_smart(count, |buf| reader.read_exact(buf).map_err(Into::into)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. This comment was copied over and is inaccurate now because I wasn't paying close enough attention to how I was implementing this version of the specialization. (Notably, we can only get this version of the specialization when using Vec, and I've completely abstracted access to that Vec away here. This is a performance regression.
Ideally we'd be using the current implementation for arrays, and read_to_end for Vec. Funnily enough, we might be able to use fake specialization that requires 'static here to switch between the two implementations, since we only need to care about u8 (and anything that might implement Container
... should we be sealing Container
?) instead of .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, no, that doesn't work. That would require us to bound C: Container<Item=Self> + 'static
, and if user generated items are not static, the Container
wrapping those items probably won't be static either.
/// smart | ||
/// | ||
/// # Errors | ||
/// | ||
/// If `f` returns an error, the error will be returned | ||
fn new_smart<Fun, Error>(count: Self::Count, f: Fun) -> Result<Self, Error> | ||
where | ||
Fun: FnMut(&mut [Self::Item]) -> Result<(), Error>, | ||
Self::Item: Default + Clone; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If features read_buf
and core_io_borrowed_buf
were stable, passing a BorrowedCursor
to the closure instead of &mut [Self::Item]
, and using Read::read_buf_exact
would almost get us to not needing to initialize our buffers. But, with the current state of that API, we'd have no way to own the filled values (without unsafe) -- we would only get a reference to a slice of the filled values. (This doesn't mean that such an API won't eventually be added, only that it doesn't currently exist.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned in a line-review I’m not feeling confident about exposing this as a public API, but I’m not opposed to this approach if everyone else finds it agreeable enough. My biggest concrete concern is how count_with
runs through Vec
and how that seems like it will be adding unnecessary memory overhead that didn’t exist before.
I feel like we are running in circles where optimisations started in Vec
, then a generic helper function was created to read into any collection, then optimisation for contiguous primitives was moved into the helper, and now optimisation is getting pushed back out into types again. Are we just doomed to keep making differently suboptimal trade-offs based on whatever seems most important right now until specialisation? Since this is all about code performance I would be biased toward just allowing unsafe
things, since that would also allow to eliminate the overhead from using resize
.
/// } | ||
/// } | ||
/// ``` | ||
fn read_options_count<'a, R, C>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name: read_collection
?
I’m not feeling confident that this should be part of the public API if this is really all just to find a better way to handle specialisation of contiguous primitive types. There are already two ways to read into collections (newtypes and helper functions), this adds a third one, and if with
directive gets implemented then there will be four. What does the decision tree look like about when to use each of these different ways? What about the write-API?
Something seems weird to me that this change means read_options
becomes effectively an optimisation of read_options_count(r, e, a, 1)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re: being a public API: This is also a way for people who have a contiguous slice of bytemuck/zerocopy types to add the optimization themselves (unless we want to add that to binrw uner feature flags), and that would be a less contrived example of the feature.
Overall, I chose the approach that I did due to a combination of making the main library less special (only one to be able to define specializations like this), not require the use of unsafe, and (if i've pulled things off right) being semver compatible.
That said, this design isn't perfect. I don't like that I'm exposing Collection as a public API. That means that users may expect more than bare functionality out of it. (If they care at all. If they don't, then exposing this as public did nothing.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something seems weird to me that this change means read_options becomes effectively an optimisation of read_options_count(r, e, a, 1).
Standard library is in a similar bind with io::Read::read_buf
and io::Read::read
. They can be implemented in terms of eachother, io::Read::read
existed first, and it is more optimal to implement io::Read::read_buf
.
They've decided as a prerequsite that they'll need a way to tell the compiler that you only need to implement one or the other. If that ever stabilizes, we can do the same.
// `binread_impl`, but *only* because `binread_impl` is not allowed | ||
// to use unsafe code to eliminate the unnecessary zero-fill. | ||
// Otherwise, performance would be identical and it could be | ||
// deleted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m not sure this comment makes sense any more since had been contextualised by surrounding code which no longer exists?
@@ -10,7 +10,9 @@ use core::num::{ | |||
}; | |||
|
|||
macro_rules! binread_impl { | |||
($($type_name:ty),*$(,)?) => { | |||
// `$(some_lit $(__unused $is_lit_present:tt))?` allows us to match on the present of a literal | |||
// using `$($($is_list_present)? fn do_whatever() {})?` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m not sure the benefits outweigh the costs of using a weirdo macro pattern like this instead of just using a second macro and sending the extra function in a tt (binread_impl
+ binread_impl_int
)?
{ | ||
count_with(n, T::read_options) | ||
move |reader, endian, arg| { | ||
let res: Vec<_> = T::read_options_count(reader, endian, arg, n)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this will use N*2 memory now in all cases, or will the optimiser elide the Vec-Iter-Vec loop so it will only use N*2 memory for non-Vec return types?
I’m concerned that the utility of this helper function is diminished to the point of uselessness by the uncontrollable memory overhead since one of the main purposes having FromIterator
as the return type is to avoid this kind of memory overhead. Otherwise, couldn’t someone just map a Vec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The optimizer appears to be able to catch it at -Copt-level=2
. I'm not 100% sure what it's doing at -Copt-level=1
. So, it can catch it (and will catch it in release mode), but it's somewhat non-trivial.
It might be possible to turn this into an API that uses iterators with a fixed size overhead. (reads into fixed-size chunks, then provides each item one at a time.) It may also be possible to add a backdoor into the iterator to preserve the current behavior for Vec (chunks that grow along with the backing memory storage. i.e. exponentially)
I think with such an API we can also say "good riddance" to the Collection trait. Let's see if I can workshop this out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I didn't care about being able to backdoor the current reading into vec behavior, this would be trivial. Trying to retain the ability to do so makes it really tough. It would be possible with type alias impl trait in trait (i think), which isn't stable yet.
} | ||
|
||
use vec_fast_int; | ||
// pub(crate) fn not_enough_bytes<T>(_: T) -> Error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please delete commented out code :-)
use alloc::vec::Vec; | ||
|
||
/// Container | ||
pub trait Container: Sized + IntoIterator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make sure I understand, this trait is just to be able to support both Vec<T>
and [T; N]
?
Is there some other collection type that would be supportable in the future?
Should this be sealed?
Should this be hidden?
+ IntoIterator<Item = T>; | ||
|
||
/// map | ||
fn map<Fun, T>(self, f: Fun) -> Self::HigherSelf<T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this part of this API (beyond the example in the read_options_count
docs, which seems pretty contrived to me)? Is this also the only reason why this trait requires IntoIterator
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part of the API very much directly exists because of that example: because when I went to write it, I reached for this pattern and realized that if i wanted to use that pattern that I would need to add support for it.
type Count = (); | ||
type HigherSelf<X> = [X; N]; | ||
|
||
fn new_naive<Fun, Error>(_count: (), mut f: Fun) -> Result<Self, Error> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use Self::Count
instead of the literal type (throughout)?
/// # Errors | ||
/// | ||
/// If `f` returns an error, the error will be returned | ||
fn new_smart<Fun, Error>(count: Self::Count, f: Fun) -> Result<Self, Error> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name: new_contiguous
? new_fast
?
/// # Errors | ||
/// | ||
/// If `f` returns an error, the error will be returned | ||
fn new_naive<Fun, Error>(count: Self::Count, f: Fun) -> Result<Self, Error> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name: new
? new_slow
?
If the ‘smart’ constructor was filling a MaybeUninit
so that Default
constraint was not required, would it still be necessary for this to exist at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmmm... I suppose, technically, it wouldn't, because we could just iterate over the slice provided by read_smart and perform reads one by one, and that would be morally equivalent, even if the underlying strategies are different.
Sorry, I started this review yesterday and didn’t see the self-review so some of my questions are redundant :-) |
Fixes #298.
The core new API here is
BinRead::read_options_count
, where the implementation forVec<T>
will delegate toT::read_options_count
. While I was implementing it, I noticed an inconsistency with the count_with helper, which can be fixed separately in another PR if this approach is rejected (or is taking a long time to solidify).While I was at it, I decided to try applying the optimization to arrays, and got what is likely a functional result, but I'm not a big fan of it. Are there any tweaks we can make to the
read_options_count
signature to make the array implementation nicer? Do we want to keep the optimization for arrays, or do we want to keep it as it currently is?Alternatives
Secondary Trait
Use another trait for this (bikeshed:
BinReadMulti
) which has the same default impl asread_options_count
, and which the derive macro automatically implements using the default impl. i.e.Alongside with an attribute to disable the automatic trait generation, would allow someone to use the derive macro and manually write the multiple item implementation. I don't this this will happen very often, though.
Use dtolnay's typeid crate
https://github.com/dtolnay/typeid allows for obtaining a typeid without
'static
. However, it would require us to write some unsafe code within this crate, or get downcast implemented in that crate. (There's also a safety requirement that I'm a little dubious on, but we probably woudn't trip over for this library.Status Quo: Wait for specialization
Whether specialization or something like this is preferred depends on what type you consider the optimization to be a property of, I guess. Is "you can read integers fast" a property of the integer, or the container(s) of that integer? This PR takes the former viewpoint: we can read
u8
fast as a property ofu8
, not as a property ofVec<u8>
, althoughVec<u8>
benefits as a result.