Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synthetic data models for benchmarking #111

Merged
merged 6 commits into from
Sep 3, 2024
Merged

Conversation

KingMob
Copy link
Contributor

@KingMob KingMob commented May 20, 2024

This adds a namespace for generating synthetic data and models. It also adds some small QoL improvements to the main performance ns.

Future PRs in the stack:

  1. Iterating over our first-pass parameter set
  2. Automatically generating the queries from the synthetic data/models
  3. Creating a command-line interface to run all this

Closes GDB-5

Copy link

codecov bot commented May 20, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 76.61%. Comparing base (dd2d851) to head (3a7243a).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #111   +/-   ##
=======================================
  Coverage   76.61%   76.61%           
=======================================
  Files          30       30           
  Lines        1531     1531           
  Branches       64       64           
=======================================
  Hits         1173     1173           
  Misses        294      294           
  Partials       64       64           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@KingMob KingMob force-pushed the synthetic-data-models-for-perf branch from 663ddf5 to 1ecae6c Compare May 21, 2024 15:41
@Schaechtle
Copy link
Contributor

This draft looks good to me. Early on, we will want to vary the CRP alphas. So maybe make the following input to generate-db?

(def ^:dynamic *default-local-alpha* 0.01)
(def ^:dynamic *default-global-alpha* 0.01)

One point that might be worth thinking about is what code can be re-used when all the random choices get exported from gen.clj traces in the future. The target format, i.e. ClojureCat is not expected to change soon.

Other than the above, the code strongly smells right for unblocking us for benchmarking. Note that I haven't done a line-by-line review yet.

@KingMob
Copy link
Contributor Author

KingMob commented May 22, 2024

This draft looks good to me. Early on, we will want to vary the CRP alphas. So maybe make the following input to generate-db?

(def ^:dynamic *default-local-alpha* 0.01)
(def ^:dynamic *default-global-alpha* 0.01)

These are just defaults. All three alpha values are override-able when calling generate-db (and the other generate-* fns. I named the keys categorical-alpha, local-alpha, and global-alpha, but lmk if there's better names.

One point that might be worth thinking about is what code can be re-used when all the random choices get exported from gen.clj traces in the future. The target format, i.e. ClojureCat is not expected to change soon.

Do you mean, reuse gen.clj code to make random choices for the synthetic data/model? I agree that would be preferable in the long run.

@KingMob KingMob force-pushed the synthetic-data-models-for-perf branch 3 times, most recently from 3fc24e1 to 9607e2d Compare May 22, 2024 14:07
@KingMob KingMob changed the title Synthetic data models for perf Synthetic data models for benchmarking May 22, 2024
@KingMob KingMob marked this pull request as ready for review May 22, 2024 14:20
@KingMob KingMob requested review from zane and Schaechtle May 22, 2024 15:09
@KingMob KingMob force-pushed the synthetic-data-models-for-perf branch from 9607e2d to f30252d Compare May 27, 2024 12:52
@KingMob KingMob force-pushed the synthetic-data-models-for-perf branch from f30252d to d8aeed0 Compare June 5, 2024 09:17
@KingMob KingMob force-pushed the synthetic-data-models-for-perf branch 3 times, most recently from 19c6253 to 639e15e Compare June 21, 2024 18:28
@KingMob KingMob force-pushed the synthetic-data-models-for-perf branch from 639e15e to 3a7243a Compare August 9, 2024 12:41
@KingMob
Copy link
Contributor Author

KingMob commented Sep 3, 2024

Merging. Unclear what the nix-build problem is.

@KingMob KingMob merged commit b7b23a4 into main Sep 3, 2024
4 of 5 checks passed
@KingMob KingMob deleted the synthetic-data-models-for-perf branch September 3, 2024 12:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants