-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to structure and design experimental scripts #18
Comments
In Rust, it should be very easy to setup up a CLI with clap. Should input to the benchmarks be read from standard input or generated inside the program? I think some algorithms might also have optimisations that should optimally be tuned at compile-time. For example, FiBA can be compiled with different I'm also wondering, would it be worth bridging the implementations with for example https://github.com/dtolnay/cxx? |
Using a library for CLI is better than my current ad-hoc approach. :) I should probably pull in some boost library rather than the hackish way I'm currently doing it.
I think that depends on the kind of benchmark. For some of what we have implemented, the data generation is synthetic. In that case, only the parameters of the benchmark are read from the command line (but not standard in). For the data-driven benchmark, we have so far read a filename from the command line, then opened that file during the run instead of reading from standard in. We have also had to treat the data driven benchmarks different. For the synthetic benchmarks, it's okay to issue a call such as:
That is, it's doing a single run of the Two-Stacks algorithm with a window of size 1024 for 10000000 iterations. The benchmark then self-reports the core running time (eliding warmup phase). A command-line-call-per-data-point is fine with synthetic benchmarks. But for data-driven benchmarks, reading the data file alone can take 5 minutes. For those benchmarks, in order to manage the amount of time it takes to run all experiments, we had to read the file once, cache it, then re-use that for many runs. (See DataGenerators.h for our implementation of the file reading and caching.) But that then means when we call the benchmark, we must also tell it to run many different experiments, which we do through a separate input file. (See the data_benchmark for what this looks like.)
We solved this problem by giving them different names. That is,
Interesting - that could allow one binary driver for both C++ and Rust. I hadn't even thought of going in that direction. |
I will try and setup a benchmark in criterion. Do you have a link for downloading the datasets? Also, is it possible to generate the synthetic data from C++ and use it as input to Rust or should I generate it directly in Rust? |
Aah, nevermind, I executed the benchmark and it downloaded everything. |
For the synthetic data, we just increment an integer (I think a We got the DEBS 2012 Grand Challenge data from: https://debs.org/grand-challenges/2012/. Since we used this dataset in FIFO experiments, we ran a pre-processing script which removes a small amount of out-of-order elements. I'll look into getting this script added to the repo. The NYC Citi Bike data (used in the FiBA paper) is from: https://www.citibikenyc.com/system-data. We used August - December, 2018. We'll want some standardized way of getting these datasets. Probably a script of some kind. |
Ok, cool 👍 I have some questions:
|
No, but it's also not your "responsibility" as the implementer of the SWAG. What I mean is that the complexity guarantees of a SWAG are always using the aggregation operation as the operation being counted. So when we say, for example, that a SWAG's operations are O(1), it means that we only call the aggregation operation a constant number of times for each operation. That does not preclude the aggregation operation itself from not being constant. The classic example of a non-constant aggregation operation is collect, which returns a list of all elements in the window. That is inherently a O(n) operation.
It doesn't. :) In C/C++, overflows can silently happen. I believe we prevent them by limiting the range of our input data.
No, but all of the benchmarks in the DEBS 2017 and VLDB 2019 are here. We don't have the experiments from VLDB 2015. The C++ benchmarks are:
The rest of that header file are the helper functions to call them from the drivers. Because some of our SWAGs had slightly different template parameters, there was a combinatorial explosion in helpers. |
The experimental scripts were written with the C++ implementations in mind. Comparing across language implementations would also be interesting, but this means that:
The text was updated successfully, but these errors were encountered: