Synthetic data models for benchmarking #111

KingMob · 2024-05-20T14:27:29Z

This adds a namespace for generating synthetic data and models. It also adds some small QoL improvements to the main performance ns.

Future PRs in the stack:

Iterating over our first-pass parameter set
Automatically generating the queries from the synthetic data/models
Creating a command-line interface to run all this

Closes GDB-5

codecov · 2024-05-20T14:32:01Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 76.61%. Comparing base (dd2d851) to head (3a7243a).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #111   +/-   ##
=======================================
  Coverage   76.61%   76.61%           
=======================================
  Files          30       30           
  Lines        1531     1531           
  Branches       64       64           
=======================================
  Hits         1173     1173           
  Misses        294      294           
  Partials       64       64

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Schaechtle · 2024-05-21T19:07:29Z

This draft looks good to me. Early on, we will want to vary the CRP alphas. So maybe make the following input to generate-db?

(def ^:dynamic *default-local-alpha* 0.01)
(def ^:dynamic *default-global-alpha* 0.01)

One point that might be worth thinking about is what code can be re-used when all the random choices get exported from gen.clj traces in the future. The target format, i.e. ClojureCat is not expected to change soon.

Other than the above, the code strongly smells right for unblocking us for benchmarking. Note that I haven't done a line-by-line review yet.

KingMob · 2024-05-22T13:05:13Z

This draft looks good to me. Early on, we will want to vary the CRP alphas. So maybe make the following input to generate-db?
(def ^:dynamic *default-local-alpha* 0.01)
(def ^:dynamic *default-global-alpha* 0.01)

These are just defaults. All three alpha values are override-able when calling generate-db (and the other generate-* fns. I named the keys categorical-alpha, local-alpha, and global-alpha, but lmk if there's better names.

One point that might be worth thinking about is what code can be re-used when all the random choices get exported from gen.clj traces in the future. The target format, i.e. ClojureCat is not expected to change soon.

Do you mean, reuse gen.clj code to make random choices for the synthetic data/model? I agree that would be preferable in the long run.

Better docstring Support for using quick-benchmark Better println control

KingMob · 2024-09-03T12:52:30Z

Merging. Unclear what the nix-build problem is.

KingMob force-pushed the synthetic-data-models-for-perf branch from 663ddf5 to 1ecae6c Compare May 21, 2024 15:41

KingMob force-pushed the synthetic-data-models-for-perf branch 3 times, most recently from 3fc24e1 to 9607e2d Compare May 22, 2024 14:07

KingMob changed the title ~~Synthetic data models for perf~~ Synthetic data models for benchmarking May 22, 2024

KingMob marked this pull request as ready for review May 22, 2024 14:20

KingMob requested review from zane and Schaechtle May 22, 2024 15:09

KingMob force-pushed the synthetic-data-models-for-perf branch from 9607e2d to f30252d Compare May 27, 2024 12:52

KingMob force-pushed the synthetic-data-models-for-perf branch from f30252d to d8aeed0 Compare June 5, 2024 09:17

KingMob force-pushed the synthetic-data-models-for-perf branch 3 times, most recently from 19c6253 to 639e15e Compare June 21, 2024 18:28

KingMob added 2 commits August 9, 2024 19:40

feat: Add QoL improvements to benchmarking

62051b5

Better docstring Support for using quick-benchmark Better println control

feat: Add synthetic data and model generation for performance testing

3a7243a

KingMob force-pushed the synthetic-data-models-for-perf branch from 639e15e to 3a7243a Compare August 9, 2024 12:41

KingMob added 4 commits September 3, 2024 19:20

feat: Generate benchmark query suite from synthetic data/model

2b64ebb

feat: Add CLI for performance benchmarking

7861ddf

feat: Save CLI benchmark results to files

823c5e2

feat: Add initial Vega-lite outputs for benchmarking results

ba8607a

KingMob merged commit b7b23a4 into main Sep 3, 2024
4 of 5 checks passed

KingMob deleted the synthetic-data-models-for-perf branch September 3, 2024 12:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Synthetic data models for benchmarking #111

Synthetic data models for benchmarking #111

KingMob commented May 20, 2024 •

edited

Loading

codecov bot commented May 20, 2024 •

edited

Loading

Schaechtle commented May 21, 2024

KingMob commented May 22, 2024

KingMob commented Sep 3, 2024

Synthetic data models for benchmarking #111

Synthetic data models for benchmarking #111

Conversation

KingMob commented May 20, 2024 • edited Loading

codecov bot commented May 20, 2024 • edited Loading

Codecov Report

Schaechtle commented May 21, 2024

KingMob commented May 22, 2024

KingMob commented Sep 3, 2024

KingMob commented May 20, 2024 •

edited

Loading

codecov bot commented May 20, 2024 •

edited

Loading