chore: initial refactoring of incremental spmd algos #2248

ethanglaser · 2025-01-09T07:13:44Z

Description

Specifies class name to invoke _get_backend, and _get_policy in the event that this is not supported for the spmd class. Allows removal of duplicate code for incremental spmd algos, and utilization of onedal4py pca transform. Also adds transform function to onedal4py PCA because scikit-learn PCA does not have a predict function, so transform was added and redirects to predict.

PR should start as a draft, then move to ready for review state after CI is passed and all applicable checkboxes are closed.
This approach ensures that reviewers don't spend extra time asking for regular requirements.

You can remove a checkbox as not applicable only if it doesn't relate to this PR in any way.
For example, PR with docs update doesn't require checkboxes for performance while PR with any change in actual code should have checkboxes and justify how this code change is expected to affect performance (or justification should be self-evident).

Checklist to comply with before moving PR from draft:

PR completeness and readability

I have reviewed my changes thoroughly before submitting this pull request.
I have commented my code, particularly in hard-to-understand areas.
I have updated the documentation to reflect the changes or created a separate PR with update and provided its number in the description, if necessary.
Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
I have added a respective label(s) to PR if I have a permission for that.
I have resolved any merge conflicts that might occur with the base branch.

Testing

I have run it locally and tested the changes extensively.
All CI jobs are green or I have provided justification why they aren't.
I have extended testing suite if new functionality was introduced in this PR.

Performance

I have measured performance for affected algorithms using scikit-learn_bench and provided at least summary table with measured data, if performance change is expected.
I have provided justification why performance has changed or why changes are not expected.
I have provided justification why quality metrics have changed or why changes are not expected.
I have extended benchmarking suite and provided corresponding scikit-learn_bench PR if new measurable functionality was introduced in this PR.

ethanglaser · 2025-01-09T07:13:54Z

/intelci: run

ethanglaser · 2025-01-09T07:17:15Z

/intelci: run

codecov · 2025-01-09T07:51:10Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Flag	Coverage Δ
azure	`?`
github	`71.01% <100.00%> (-0.05%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
...l/basic_statistics/incremental_basic_statistics.py	`100.00% <100.00%> (ø)`
onedal/covariance/incremental_covariance.py	`92.68% <100.00%> (ø)`
onedal/decomposition/incremental_pca.py	`96.77% <100.00%> (ø)`
onedal/decomposition/pca.py	`93.39% <100.00%> (+0.06%)`	⬆️
onedal/linear_model/incremental_linear_model.py	`97.75% <100.00%> (ø)`

... and 70 files with indirect coverage changes

icfaust

I know this is still a WIP, but maybe we should have a chat about strategy?

onedal/spmd/linear_model/incremental_linear_model.py

ethanglaser · 2025-01-10T19:48:34Z

/intelci: run

ethanglaser · 2025-01-10T21:21:29Z

/intelci: run

onedal/basic_statistics/incremental_basic_statistics.py

ethanglaser · 2025-01-14T22:25:56Z

/intelci: run

ethanglaser · 2025-01-21T22:41:25Z

/intelci: run

icfaust · 2025-01-28T19:54:22Z

/intelci: run

icfaust

I'll be honest, the get_backend stuff just seems to be a headache through and through. Small stuff but overall really good work.

icfaust · 2025-01-29T13:35:03Z

onedal/basic_statistics/incremental_basic_statistics.py

@@ -71,8 +71,9 @@ def __init__(self, result_options="all"):

    def _reset(self):
        self._need_to_finalize = False
-        self._partial_result = self._get_backend(
-            "basic_statistics", None, "partial_compute_result"
+        # Not supported with spmd policy so IncrementalBasicStatistics must be specified


Love the comment

icfaust · 2025-01-29T13:39:45Z

onedal/basic_statistics/incremental_basic_statistics.py

-            "basic_statistics", None, "partial_compute_result"
+        # Not supported with spmd policy so IncrementalBasicStatistics must be specified
+        self._partial_result = IncrementalBasicStatistics._get_backend(
+            IncrementalBasicStatistics, "basic_statistics", None, "partial_compute_result"


@ahuber21 these changes are likely going to interact with #2168 (just as a heads up).

onedal/decomposition/pca.py

icfaust · 2025-01-29T13:45:33Z

onedal/decomposition/tests/test_incremental_pca.py

@@ -40,7 +40,7 @@ def test_on_gold_data(queue, is_deterministic, whiten, num_blocks, dtype):

    result = incpca.finalize_fit()

-    transformed_data = incpca.predict(X, queue=queue)
+    transformed_data = incpca.transform(X, queue=queue)


Just for conformance purposes to sklearn (though not strictly necessary in the onedal folder)?

exactly - pca predict does not exist in sklearn so would prefer to not use this convention if possible

Co-authored-by: Ian Faust <[email protected]>

ethanglaser · 2025-01-29T19:32:58Z

/intelci: run

Refactor incremental spmd algos

6112457

icfaust reviewed Jan 9, 2025

View reviewed changes

onedal/spmd/linear_model/incremental_linear_model.py Show resolved Hide resolved

onedal/spmd/linear_model/incremental_linear_model.py Outdated Show resolved Hide resolved

ethanglaser added 2 commits January 10, 2025 11:42

Clear spmd impls, specify non-spmd get_policy in base cls

9c72d9c

black

e455c56

minor bs fix

572bae5

ethanglaser commented Jan 10, 2025

View reviewed changes

onedal/basic_statistics/incremental_basic_statistics.py Show resolved Hide resolved

ethanglaser and others added 3 commits January 14, 2025 14:14

apply changes to PCA predict and add transform

da5c27c

add comments

6fdcbaa

Merge branch 'uxlfoundation:main' into dev/eglaser-online-spmd-refactor

4acc102

ethanglaser requested a review from icfaust January 21, 2025 19:21

merge main into branch

f9e6b36

ethanglaser added 3 commits January 21, 2025 15:38

tuple indices safeguarding

28c4eb5

incremental bs fit fixes

bc147eb

restore previous 2, added to raw inputs instead

8559c5f

ethanglaser changed the title ~~Refactor incremental spmd algos~~ chore: initial refactoring of incremental spmd algos Jan 28, 2025

icfaust reviewed Jan 29, 2025

View reviewed changes

Update onedal/decomposition/pca.py

1fdecd1

Co-authored-by: Ian Faust <[email protected]>

ethanglaser marked this pull request as ready for review January 29, 2025 19:32

ethanglaser requested review from samir-nasibli and Alexsandruss as code owners January 29, 2025 19:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: initial refactoring of incremental spmd algos #2248

chore: initial refactoring of incremental spmd algos #2248

ethanglaser commented Jan 9, 2025 •

edited

Loading

ethanglaser commented Jan 9, 2025

ethanglaser commented Jan 9, 2025

codecov bot commented Jan 9, 2025 •

edited

Loading

icfaust left a comment

ethanglaser commented Jan 10, 2025

ethanglaser commented Jan 10, 2025

ethanglaser commented Jan 14, 2025

ethanglaser commented Jan 21, 2025

icfaust commented Jan 28, 2025

icfaust left a comment

icfaust Jan 29, 2025

icfaust Jan 29, 2025

icfaust Jan 29, 2025

ethanglaser Jan 29, 2025

ethanglaser commented Jan 29, 2025

chore: initial refactoring of incremental spmd algos #2248

Are you sure you want to change the base?

chore: initial refactoring of incremental spmd algos #2248

Conversation

ethanglaser commented Jan 9, 2025 • edited Loading

Description

ethanglaser commented Jan 9, 2025

ethanglaser commented Jan 9, 2025

codecov bot commented Jan 9, 2025 • edited Loading

Codecov Report

icfaust left a comment

Choose a reason for hiding this comment

ethanglaser commented Jan 10, 2025

ethanglaser commented Jan 10, 2025

ethanglaser commented Jan 14, 2025

ethanglaser commented Jan 21, 2025

icfaust commented Jan 28, 2025

icfaust left a comment

Choose a reason for hiding this comment

icfaust Jan 29, 2025

Choose a reason for hiding this comment

icfaust Jan 29, 2025

Choose a reason for hiding this comment

icfaust Jan 29, 2025

Choose a reason for hiding this comment

ethanglaser Jan 29, 2025

Choose a reason for hiding this comment

ethanglaser commented Jan 29, 2025

ethanglaser commented Jan 9, 2025 •

edited

Loading

codecov bot commented Jan 9, 2025 •

edited

Loading