-
Notifications
You must be signed in to change notification settings - Fork 465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
persist: Add more testing around stats #27117
persist: Add more testing around stats #27117
Conversation
5dfb8b4
to
8eb7cef
Compare
|
Very cool! @bkirwi mind taking the review on this one? |
@ggevay nice! Glad y'all can also find a use for the proptest impls here! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
This PR adds some more testing around our
Part
statistics. Specifically it adds two new tests:proptest
for correctness. We generate arbitraryColumnType
s, and use thatColumnType
to generate an arbitraryVec<Row>
, then we calculate stats on that collection ofRow
s and assert that everyRow
would be included in the stats.proptest
with a constant seed to generate 1,000 instances ofRelationDesc
s with at most 4 columns, then a collection of at most 8Row
s for theseRelationDesc
s. We generate statistics for all 1,000 scenarios and then take a JSON snapshot of the stats. This test helps us track if any changes occur to our statistics generation.I'm curious what folks thoughts are on the second test, I'm more than happy to not merge it and use it only to validate #27009, if we don't think it provides a ton of signal.
Motivation
Protect against stats breaking, e.g. in changes like #27009
Tips for reviewer
The PR is broken up into 2 commits:
Datum
s from aColumnType
, and adding the first test.Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.