You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
RAM has >100ns memory access delay, whereas cache has lower than 60ns. This is abit vague in the README.
On top of that comes the hardware prefetcher that detects linear access patterns and has some dark magic for speculative execution with speculative cache prefetching.
Usually Linux memory-maps smaller files directly to L3 cache (when available), which is also not exactly formulated in the README.
Since the algorithm used is a streaming one with sliding window for vectorization: Is there a specific reason, why you do not use circular buffers as fastest available data structure?
Otherwise, it would be helpful to make a separate benchmark on allocator implementations only for reading files.
The text was updated successfully, but these errors were encountered:
matu3ba
changed the title
clarify RAM <-> cache behavior
clarify README
Mar 4, 2021
RAM has >100ns memory access delay, whereas cache has lower than 60ns. This is abit vague in the README.
On top of that comes the hardware prefetcher that detects linear access patterns and has some dark magic for speculative execution with speculative cache prefetching.
Usually Linux memory-maps smaller files directly to L3 cache (when available), which is also not exactly formulated in the README.
Since the algorithm used is a streaming one with sliding window for vectorization: Is there a specific reason, why you do not use circular buffers as fastest available data structure?
Otherwise, it would be helpful to make a separate benchmark on allocator implementations only for reading files.
The text was updated successfully, but these errors were encountered: