Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to structure and design experimental scripts #18

Open
scotts opened this issue Jul 10, 2020 · 7 comments
Open

How to structure and design experimental scripts #18

scotts opened this issue Jul 10, 2020 · 7 comments
Labels
question Further information is requested

Comments

@scotts
Copy link
Collaborator

scotts commented Jul 10, 2020

The experimental scripts were written with the C++ implementations in mind. Comparing across language implementations would also be interesting, but this means that:

  1. The scripts need to have a concept of language implementation.
  2. Each language will need a benchmark driver that performs equivalent experiments, ideally exposing the same command-line interface.
@scotts scotts added the question Further information is requested label Jul 10, 2020
@segeljakt
Copy link
Contributor

segeljakt commented Jul 12, 2020

In Rust, it should be very easy to setup up a CLI with clap. Should input to the benchmarks be read from standard input or generated inside the program?

I think some algorithms might also have optimisations that should optimally be tuned at compile-time. For example, FiBA can be compiled with different MAX_ARITY. In Rust this can be solved through environment variables.

I'm also wondering, would it be worth bridging the implementations with for example https://github.com/dtolnay/cxx?

@scotts
Copy link
Collaborator Author

scotts commented Jul 15, 2020

Using a library for CLI is better than my current ad-hoc approach. :) I should probably pull in some boost library rather than the hackish way I'm currently doing it.

Should input to the benchmarks be read from standard input or generated inside the program?

I think that depends on the kind of benchmark. For some of what we have implemented, the data generation is synthetic. In that case, only the parameters of the benchmark are read from the command line (but not standard in). For the data-driven benchmark, we have so far read a filename from the command line, then opened that file during the run instead of reading from standard in.

We have also had to treat the data driven benchmarks different. For the synthetic benchmarks, it's okay to issue a call such as:

$ benchmark_driver two_stacks 1024 10000000

That is, it's doing a single run of the Two-Stacks algorithm with a window of size 1024 for 10000000 iterations. The benchmark then self-reports the core running time (eliding warmup phase). A command-line-call-per-data-point is fine with synthetic benchmarks. But for data-driven benchmarks, reading the data file alone can take 5 minutes. For those benchmarks, in order to manage the amount of time it takes to run all experiments, we had to read the file once, cache it, then re-use that for many runs. (See DataGenerators.h for our implementation of the file reading and caching.) But that then means when we call the benchmark, we must also tell it to run many different experiments, which we do through a separate input file. (See the data_benchmark for what this looks like.)

I think some algorithms might also have optimisations that should optimally be tuned at compile-time.

We solved this problem by giving them different names. That is, fiba is not actually a name recognized by our benchmarks, but it does recognize fiba4, fiba8, etc. (See, for example, where we determine which SWAG to use.) I never liked the way I ended up implementing that, as I ended up having to implement dozens of helper functions because of all of the different ways the SWAGs were template-parameterized. I have some hope that your clean interfaces, and Rust's macro support, can allow a much cleaner implementation of a similar idea.

I'm also wondering, would it be worth bridging the implementations with for example https://github.com/dtolnay/cxx?

Interesting - that could allow one binary driver for both C++ and Rust. I hadn't even thought of going in that direction.

@segeljakt
Copy link
Contributor

I will try and setup a benchmark in criterion. Do you have a link for downloading the datasets? Also, is it possible to generate the synthetic data from C++ and use it as input to Rust or should I generate it directly in Rust?

@segeljakt
Copy link
Contributor

Aah, nevermind, I executed the benchmark and it downloaded everything.

@scotts
Copy link
Collaborator Author

scotts commented Jul 16, 2020

For the synthetic data, we just increment an integer (I think a uint64_t), so there's no need to do that across languages. That's probably what you ran, I don't think we automatically download the real datasets, since they are so large.

We got the DEBS 2012 Grand Challenge data from: https://debs.org/grand-challenges/2012/. Since we used this dataset in FIFO experiments, we ran a pre-processing script which removes a small amount of out-of-order elements. I'll look into getting this script added to the repo.

The NYC Citi Bike data (used in the FiBA paper) is from: https://www.citibikenyc.com/system-data. We used August - December, 2018.

We'll want some standardized way of getting these datasets. Probably a script of some kind.

@segeljakt
Copy link
Contributor

segeljakt commented Jul 16, 2020

Ok, cool 👍 I have some questions:

  • Is it in general ok to assume aggregation operators have a constant execution time?
  • How does C++ deal with overflows/underflows with aggregation operators like sum?
  • Are the benchmarks here 1:1 with the ones in the papers?

@scotts
Copy link
Collaborator Author

scotts commented Jul 17, 2020

Is it in general ok to assume aggregation operators have a constant execution time?

No, but it's also not your "responsibility" as the implementer of the SWAG. What I mean is that the complexity guarantees of a SWAG are always using the aggregation operation as the operation being counted. So when we say, for example, that a SWAG's operations are O(1), it means that we only call the aggregation operation a constant number of times for each operation. That does not preclude the aggregation operation itself from not being constant. The classic example of a non-constant aggregation operation is collect, which returns a list of all elements in the window. That is inherently a O(n) operation.

How does C++ deal with overflows/underflows with aggregation operators like sum?

It doesn't. :) In C/C++, overflows can silently happen. I believe we prevent them by limiting the range of our input data.

Are the benchmarks here 1:1 with the ones in the papers?

No, but all of the benchmarks in the DEBS 2017 and VLDB 2019 are here. We don't have the experiments from VLDB 2015. The C++ benchmarks are:

The rest of that header file are the helper functions to call them from the drivers. Because some of our SWAGs had slightly different template parameters, there was a combinatorial explosion in helpers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants