Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Diagnose slow hypothesis tests #301

Open
pschanely opened this issue Aug 19, 2024 · 8 comments
Open

Diagnose slow hypothesis tests #301

pschanely opened this issue Aug 19, 2024 · 8 comments

Comments

@pschanely
Copy link
Owner

We're actually running low on important semantic issues, so I'd probably turn next to performance issues - there are some very slow tests that I've just skipped because they take several minutes, or hours, or don't finish at all within the 6h Actions time limit. Current commit with those is HypothesisWorks/hypothesis@bb43210

Originally posted by @Zac-HD in #292 (comment)

@pschanely
Copy link
Owner Author

@Zac-HD Done with one round of investigations and fixes! (hypothesis-crosshair 0.0.13 and crosshair-tool 0.0.68) Breaking up the tests into categories:

minimize() tests - should we never be running these with CrossHair? I see we try to avoid running it here, although some of the tests that are timing out appear to have _current_build_context.value == None under the crosshair testing profile. Affected tests:

  • test_can_generate_both_zeros_when_in_interval
  • test_mutual_recursion
  • test_hashable_type_unhashable_value
  • many tests in tests/nocover/test_simple_numbers.py

Nested tests (used to time out, but now CrossHair crashes fast):

  • test_nesting_with_control_passes_health_check

Just plain better, due to recent updates:

  • test_drawing_many_near_boundary (though still slowish)
  • test_lookup_knows_about_all_core_strategies
  • test_fullmatch_generates_example

This gets a solver unsat here, ideally this will raise hypothesis.errors.BackendCannotProceed once that's available ... and would keep trying.

  • test_hard_to_find_single_example

For me to continue to investigate:

  • tests in tests/nocover/test_health_checks.py - not sure I'll be able to get too_large_strategy to work, but will poke at it a little. The other tests complete after ~minutes.
  • tests in tests/nocover/test_type_lookup_forward_ref.py no longer times out, but now yields a strange SimpleDict object is not callable error.

@Zac-HD
Copy link
Contributor

Zac-HD commented Aug 20, 2024

Sweet! I'll update shortly and see how it goes 😁

minimize() tests - should we never be running these with CrossHair? I see we try to avoid running it here, although some of the tests that are timing out appear to have _current_build_context.value == None under the crosshair testing profile.

That skip is specific to running minimal(...) _inside @given(), i.e. instead of crashing with the nested-tests error we just skip that test. On investigation there's a very obvious guess as to why it's super slow, too: minimal() has to find a counterexample before it can shrink that, and if Crosshair happens to not find one we can spend an awfully long time going through the max_examples=50_000 (!!!) iterations.

My action items:

  • turn that down globally, maybe even lower for crosshair, mark affected tests with Why.undiscovered to investigate later
  • unskip tests we expect are fixed now, update deps, and trigger CI
  • maybe hack together a spike for hypothesis.errors.BackendCannotProceed?

@pschanely
Copy link
Owner Author

That skip is specific to running minimal(...) _inside @given(), i.e. instead of crashing with the nested-tests error we just skip that test. On investigation there's a very obvious guess as to why it's super slow, too: minimal() has to find a counterexample before it can shrink that, and if Crosshair happens to not find one we can spend an awfully long time going through the max_examples=50_000 (!!!) iterations.

Ah! I get it now. So I'll continue to look at those too.

@Zac-HD
Copy link
Contributor

Zac-HD commented Aug 20, 2024

Very hacky, but... done. Well, maybe, I'll see what CI thinks after I sleep 😄

@Zac-HD
Copy link
Contributor

Zac-HD commented Aug 20, 2024

OK, there are quite a few failures but I did mess around with the internals a bit and overall it's looking like an improvement. Quick triage:

  • (Zac) pull out and merge the cleanups to minimal() etc Various test cleanups for Crosshair integration HypothesisWorks/hypothesis#4090
  • (Zac) extract and test BackendCannotProceed changes to a separate PR
  • (Phillip) fix TypeError: unhashable type: 'tzutc' ('Rule', etc) regressions, failing on Crosshair backend but not Hypothesis
  • (Phillip) Unsatisfiable: Could not find any examples from floats() that satisfied lambda x: x + 1 == x and not math.isinf(x)
  • (???) strategy reprs fail to omit default values under crosshair, e.g. here, unclear where that should be fixed. Might be me in LazyStrategy.__repr__ due to if k not in sig.parameters or v is not sig.parameters[k].default?

@pschanely
Copy link
Owner Author

I got sidetracked on a few subtle-but-important issues this week, but this is still a big priority! I'll be looking at these in the next few days.

@pschanely
Copy link
Owner Author

0.0.70 fixes that ugly regression with unhashables!

The following two issues are going to take me a while, but, honestly, might legitimately be the next things to look at.

  • (Phillip) Unsatisfiable: Could not find any examples from floats() that satisfied lambda x: x + 1 == x and not math.isinf(x)

I'll need to tackle the issue of real float semantics for this one. High value, ~high effort.

  • (???) strategy reprs fail to omit default values under crosshair, e.g. here, unclear where that should be fixed. Might be me in LazyStrategy.__repr__ due to if k not in sig.parameters or v is not sig.parameters[k].default?

Your guess is right - the failing int bound is a symbolic. I have a branch on the backburner for more accurate identity semantics. I hadn't thought about faithful integer intern'ing as part of that (the biggest wins relate to avoiding path forks on enums and optional values), but I think I could make it happen.

As for whether the identity check on the hypothesis side is objectively correct, I'm unsure. Integer intern'ing isn't guaranteed across implementations. But in order for your test to fail, I think integer intern'ing would have to switch up mid-execution, which seems pretty unlikely.

@Zac-HD
Copy link
Contributor

Zac-HD commented Aug 25, 2024

For the identity check, I'm not (deliberately) checking interned integers; but rather whether the value is the default argument - so that we show even the default value if it was explictly passed. Integers usually defeat that, but after a quick investigation I've classified this as "not expected to pass under crosshair" so we're done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants