test_benchmarking_parallel_simulation fails on CI for one configuration #1111

Baschdl · 2024-03-25T12:03:43Z

The test test_benchmarking_parallel_simulation fails for sim_batch_size=1, num_workers=10 but works for all other configurations on CI:
https://github.com/sbi-dev/sbi/actions/runs/8418222476/job/23048287384?pr=1063#step:8:4939

FAILED tests/multiprocessing_test.py::test_benchmarking_parallel_simulation[1-10] - assert 12.29891848564148 <= (5.03387713432312 * 1.1)

The most likely reasons for this failure are either the overhead of creating 10 workers (on a machine with only 2 cores) is not worth it if we only have 1 sample per batch or problems with running this test with -n auto. I think the first is more likely because all other variants work.

This was noticed in PR #1063 when running the slow tests on CI.

The text was updated successfully, but these errors were encountered:

Baschdl · 2024-03-25T12:21:13Z

I can reproduce this issue locally on a machine with 10 cores, so the number of cores in the CI environment is not the issue. Indeed @famura, running this test with -n auto causes the problem (or not, see two comments down: #1111 (comment)).
I will quickly start a CI run on our slow tests to see if it's worth to run them with -n auto or if we should just get rid of it. They take 33min with parallel execution (see failed run).

Baschdl · 2024-03-25T12:26:07Z

a115191 runs here: https://github.com/sbi-dev/sbi/actions/runs/8420009653/job/23053830420

Baschdl · 2024-03-25T13:14:50Z

This run took 43min. So running the slow tests in parallel saves 10min.

But the above mentioned tests also fails when sequentially executed: https://github.com/sbi-dev/sbi/actions/runs/8420009653/job/23053830420#step:7:3555

janfb · 2024-03-25T14:00:02Z

Maybe a quicker solution: Do we really need this test?

It seems I thought so back then, but maybe we only have to test the API here and not the fact that parallel simulation should be faster than sequential simulation? (this should be obvious, and of course there is a trade off, but we don't have to do this benchmarking in our tests).

what do the others think, e.g., @michaeldeistler @manuelgloeckler?

Baschdl · 2024-03-28T17:55:22Z

I talked with @michaeldeistler today and he was in favor of changing the test to make it work instead of completely kicking it out. He also mentioned that the test failed on his old machine.

We currently run it with num_workers=10 and num_workers=-2 (all cpus but one). num_workers=10 doesn't really make sense on a machine with only two cores (Github Actions), so I would propose to change it to num_workers=-1. On a machine with a bunch of cores this would still be similar to num_workers=10. The downside is that the results could differ from machine to machine but that's also currently the case with num_workers=-2.

This change works on Github Actions: https://github.com/sbi-dev/sbi/actions/runs/8471258924/job/23210817060.

We may have to introduce the additional pytest mark sequential discussed elsewhere to make it work on all machines when people use -n auto.

Baschdl · 2024-03-28T18:27:51Z

num_workers=-1 together with -n auto works on Github Actions if we allow joblib to be a little bit slower than the 10% we currently have in the test. I would open a PR for that to get #1063 merged.

Tested here: https://github.com/sbi-dev/sbi/actions/runs/8471478141/job/23211524576

famura · 2024-03-28T18:51:54Z

I got a really naive comment that I still want to make. What speaks against using exactly 2 (or 3) workers? I suppose any action runner has at least 2 cores and it would technically still be a parallel simulation. This way we get a more deterministic behavior I suppose.

Baschdl · 2024-03-28T19:05:29Z

To be honest I just tried to replicate the current behavior as close as possible. Setting it to 2 should work equally well and I would be in favor of reproducible behavior across machines.

Baschdl added the bug Something isn't working label Mar 25, 2024

Baschdl self-assigned this Mar 25, 2024

Baschdl mentioned this issue Mar 28, 2024

Split the github workflow in CI and CD #1063

Merged

8 tasks

manuelgloeckler linked a pull request Apr 2, 2024 that will close this issue

multiprocessing only with num_cores #1117

Merged

manuelgloeckler mentioned this issue Apr 2, 2024

multiprocessing only with num_cores #1117

Merged

janfb closed this as completed in #1117 Apr 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test_benchmarking_parallel_simulation fails on CI for one configuration #1111

test_benchmarking_parallel_simulation fails on CI for one configuration #1111

Baschdl commented Mar 25, 2024 •

edited

Loading

Baschdl commented Mar 25, 2024 •

edited

Loading

Baschdl commented Mar 25, 2024 •

edited

Loading

Baschdl commented Mar 25, 2024

janfb commented Mar 25, 2024 •

edited

Loading

Baschdl commented Mar 28, 2024

Baschdl commented Mar 28, 2024

famura commented Mar 28, 2024

Baschdl commented Mar 28, 2024

test_benchmarking_parallel_simulation fails on CI for one configuration #1111

test_benchmarking_parallel_simulation fails on CI for one configuration #1111

Comments

Baschdl commented Mar 25, 2024 • edited Loading

Baschdl commented Mar 25, 2024 • edited Loading

Baschdl commented Mar 25, 2024 • edited Loading

Baschdl commented Mar 25, 2024

janfb commented Mar 25, 2024 • edited Loading

Baschdl commented Mar 28, 2024

Baschdl commented Mar 28, 2024

famura commented Mar 28, 2024

Baschdl commented Mar 28, 2024

Baschdl commented Mar 25, 2024 •

edited

Loading

Baschdl commented Mar 25, 2024 •

edited

Loading

Baschdl commented Mar 25, 2024 •

edited

Loading

janfb commented Mar 25, 2024 •

edited

Loading