-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor test_random
to minimize collective calls
#1677
Conversation
Thank you for the PR! |
Thank you for the PR! |
Thank you for the PR! |
Thank you for the PR! |
Thank you for the PR! |
1 similar comment
Thank you for the PR! |
Thank you for the PR! |
1 similar comment
Thank you for the PR! |
Thank you for the PR! |
Thank you for the PR! |
Thank you for the PR! |
Thank you for the PR! |
Thank you for the PR! |
Thank you for the PR! |
test_random
to minimize collective calls
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1677 +/- ##
=======================================
Coverage 92.13% 92.13%
=======================================
Files 83 83
Lines 12165 12173 +8
=======================================
+ Hits 11208 11216 +8
Misses 957 957
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. You skipped the median tests. Is this intentional?
Thank you for the PR! |
Thank you for the PR! |
Thank you for the PR! |
Yes, I skipped the |
Thank you for the PR! |
* debugging * fix misinterpretation of dtype * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * replace numpy() calls with alternative checks * debugging * debugging * debugging randint * debugging * cast ints to float in statistical ops * bypass numpy call l. 197 * bypass more numpy calls, skip median checks * bypass more numpy calls, skip median checks * bypass numpy calls wherever possible * reinstate median checks * skip ht.median if split>0 * skip all ht.median * Revert "skip all ht.median" This reverts commit 1241454. * Revert "skip ht.median if split>0" This reverts commit 4da8c93. * Revert "reinstate median checks" This reverts commit bf50914. (cherry picked from commit 4b3e570)
Successfully created backport PR for |
* debugging * fix misinterpretation of dtype * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * replace numpy() calls with alternative checks * debugging * debugging * debugging randint * debugging * cast ints to float in statistical ops * bypass numpy call l. 197 * bypass more numpy calls, skip median checks * bypass more numpy calls, skip median checks * bypass numpy calls wherever possible * reinstate median checks * skip ht.median if split>0 * skip all ht.median * Revert "skip all ht.median" This reverts commit 1241454. * Revert "skip ht.median if split>0" This reverts commit 4da8c93. * Revert "reinstate median checks" This reverts commit bf50914. (cherry picked from commit 4b3e570) Co-authored-by: Claudia Comito <[email protected]>
* Maintenance/version change (#1644) * Change dev -> rc1 * Update CHANGELOG.md * Update CITATION.cff --------- Co-authored-by: Claudia Comito <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix: missing backported pr * release-drafter autolabeling config * tmp change to pull_request from pull_request_target * correct commitish * not filter by-commitish * test autolable * autolabeler, second try * check that changes are being reflected on the draft release * testing if main is the answer * trying to make changes visible * still trying to see some changes * last state, need to test on a fork * complete autolabeler configuration * removed the second flame * Refactor `test_random` to minimize collective calls (#1677) (#1683) * debugging * fix misinterpretation of dtype * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * replace numpy() calls with alternative checks * debugging * debugging * debugging randint * debugging * cast ints to float in statistical ops * bypass numpy call l. 197 * bypass more numpy calls, skip median checks * bypass more numpy calls, skip median checks * bypass numpy calls wherever possible * reinstate median checks * skip ht.median if split>0 * skip all ht.median * Revert "skip all ht.median" This reverts commit 1241454. * Revert "skip ht.median if split>0" This reverts commit 4da8c93. * Revert "reinstate median checks" This reverts commit bf50914. (cherry picked from commit 4b3e570) Co-authored-by: Claudia Comito <[email protected]> * initialised ipcluster with mpi (#1679) (#1684) Co-authored-by: jindra1 <[email protected]> Co-authored-by: Claudia Comito <[email protected]> (cherry picked from commit 68319be) Co-authored-by: Marc-Jindra <[email protected]> * authors list and version update for 1.5 * Update CHANGELOG.md * Support PyTorch 2.4.1 (#1655) (#1687) * Support latest PyTorch release * Update bug_report.yml * Update ci.yaml * Update setup.py * Update basic_test.py * skip failing test hip/rocm --------- Co-authored-by: ClaudiaComito <[email protected]> Co-authored-by: Michael Tarnawa <[email protected]> Co-authored-by: Fabian Hoppe <[email protected]> (cherry picked from commit 78d480a) * add Dalcin et al reference (#1695) (cherry picked from commit 99f6f4b) * Support PyTorch 2.5.1 (#1701) (#1706) * Support latest PyTorch release * Update dependencies * Update bug_report.yml * Update ci.yaml * Update setup.py --------- Co-authored-by: mtar <[email protected]> Co-authored-by: Fabian Hoppe <[email protected]> (cherry picked from commit b912846) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Documentation updates after new release (#1704) (#1708) * loose ends after releasing * remove version update --------- Co-authored-by: Fabian Hoppe <[email protected]> Co-authored-by: Michael Tarnawa <[email protected]> (cherry picked from commit e68db45) Co-authored-by: Claudia Comito <[email protected]> * ci: added updated version of claudias release-prep workflow * post-review commit * Modernise setup.py configuration (#1731) (#1743) * set build-system * fix deprecation warning * make it easier to get to GitHub from the docs (cherry picked from commit b4b5540) * no black formatting on tutorials (#1748) (cherry picked from commit 8e8c37d) * Bug fix: printing non-distributed data (#1756) (#1764) * make 1-proc print great again * fix tabs size * skip formatter on non-distr data * remove time import (cherry picked from commit 3082dd9) Co-authored-by: Claudia Comito <[email protected]> * Fixed precision loss in several functions when dtype is float64 (#993) (#1790) * Fix `array` * Fix `arange` * Fix `linspace` * Fix `abs` `fabs` and `matrix_norm` were also modified to explicitly cast to float, in accordance with pre-established behaviour. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Comment out dtype for testing * Changed 2 tests that were asking for float where it now returns int --------- Co-authored-by: Claudia Comito <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Michael Tarnawa <[email protected]> Co-authored-by: Marc-Jindra <[email protected]> (cherry picked from commit ab677a6) Co-authored-by: neosunhan <[email protected]> * `heat.eq`, `heat.ne` now allow non-array operands (#1773) (#1791) * changed eq and ne so that the input of wrong Types does not cause an error * Changed eq and ne to include try except for wrong Types * Changed tests to assert True/False instead of Errors * fixed spelling of erroneous_type --------- Co-authored-by: Claudia Comito <[email protected]> (cherry picked from commit c282cb1) Co-authored-by: Marc-Jindra <[email protected]> * Bump version to 1.5.1 * add backport label * updated changelog * Update CHANGELOG.md * updated pre-commit * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update CITATION.cff * Support pytorch 2.6 * Match torchvision to pytorch 2.6 * updated release note * Updated RELEASE.md * Merge pull request #1775 from helmholtz-analytics/support/new-pytorch-main Support PyTorch 2.6.0 / Add zarr as optional dependency * Removing some cherrypicked stuff * Update RELEASE.md * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Heat 1.5.1 - Release (#1796) * Bump version to 1.5.1 * add backport label * updated changelog * Update CHANGELOG.md * updated pre-commit * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update CITATION.cff * Support pytorch 2.6 * Match torchvision to pytorch 2.6 * updated release note * Updated RELEASE.md * Merge pull request #1775 from helmholtz-analytics/support/new-pytorch-main Support PyTorch 2.6.0 / Add zarr as optional dependency * Removing some cherrypicked stuff * Update RELEASE.md * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Heat Release Bot <> Co-authored-by: Claudia Comito <[email protected]> Co-authored-by: Gutiérrez Hermosillo Muriedas, Juan Pedro <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Fabian Hoppe <[email protected]> * Fix to release-prep action * correct version for main --------- Co-authored-by: Berkant <[email protected]> Co-authored-by: Claudia Comito <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Marc-Jindra <[email protected]> Co-authored-by: Michael Tarnawa <[email protected]> Co-authored-by: Fabian Hoppe <[email protected]> Co-authored-by: Michael Tarnawa <[email protected]> Co-authored-by: Jörn Hees <[email protected]> Co-authored-by: neosunhan <[email protected]> Co-authored-by: Heat Release Bot <>
Due Diligence
Description
test_random
has been giving us problems in connection to.numpy()
calls (aka Allgather/Allgatherv and copying to CPU) before.As far as I can tell, it isn't any particular instance of "allgathering" that doesn't work. On the AMD runner (2-process GPU tests), since this Monday,
test_random
has been failing consistently around the 10th numpy() call in the module.I have refactored
test_random
to gather and copy only when absolutely necessary. It now gathers/copies to CPU only 8 times, as opposed to 47 in the legacy implementation.Issue/s resolved: #1682
Changes proposed:
Type of change
Bug fix (non-breaking change which fixes an issue)
Memory requirements
NA
Performance
NA
Does this change modify the behaviour of other functions? If so, which?
no