You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The parallel eachobs implementation is not deterministic in that observations are returned as soon as they are loaded, so they may be returned out of order. This is very performant, and fine for some use cases like training, where data should be shuffled anyway.
To give the option to have a deterministic iteration would be helpful in many use cases, though.
This could be implemented as a wrapper around an existing iterator that does the following:
instead of iterating over data with the wrapped iterator, iterate over (1:nobs(data), data) to preserve ordering information
collect returned observations, stripping the index
return an observation only if all previous (by index) observations have been returned
I am unsure by how much this will affect performance and memory usage and how the interplay is with buffersize. Are there alternative approaches to this implementation?
The text was updated successfully, but these errors were encountered:
I believe FFCV has the notion of a traversal order which we might want to look into. Apparently the quasi-random variant increases performance too, so there may be a third option between random and deterministic that we want to include here.
Do you know what they are doing there specifically? I thought that was just pre-shuffling and then storing into contiguous memory, but from that page it seems like that the quasi-random order is only relevant when not in-memory.
The parallel
eachobs
implementation is not deterministic in that observations are returned as soon as they are loaded, so they may be returned out of order. This is very performant, and fine for some use cases like training, where data should be shuffled anyway.To give the option to have a deterministic iteration would be helpful in many use cases, though.
This could be implemented as a wrapper around an existing iterator that does the following:
data
with the wrapped iterator, iterate over(1:nobs(data), data)
to preserve ordering informationI am unsure by how much this will affect performance and memory usage and how the interplay is with
buffersize
. Are there alternative approaches to this implementation?The text was updated successfully, but these errors were encountered: