Multi-GPU support with dask #179

Intron7 · 2024-04-25T13:07:19Z

This adds dask support

Functions to add:

for more information, see https://pre-commit.ci

ilan-gold · 2024-11-14T17:00:14Z

We should look into the cost of allocating ahead of time for all operations that are currently in-place

Intron7 · 2024-11-21T12:05:21Z

Median out of core is a bad choice. Uses way more memory and is slower. Loose Loose

for more information, see https://pre-commit.ci

flying-sheep

I don’t think I can really review this.

I checked out parts like the PCA code, but generally the whole codebase lacks some abstractions that would get rid of visual noise while exposing the intent, especially for patterns like invoking a kernel with block sizes specified (mentioned here: https://github.com/scverse/rapids_singlecell/pull/179/files#r1838497326)

I also still want to see docstrings for the kernels! (ideally coupled with an abstraction for the kernel pattern)

Let’s tackle both of these abstractions before the next feature PR. I’m happy to meet with you to design them.

If the first abstraction (calling kernels) and the big code moves (e.g. PCA) were done before this PR, I could actually see what changes happened and what’s new and therefore review this, but like this it’s just too much to wrap my head around.

PS: there are a bunch of unaddressed comments still above

flying-sheep · 2024-12-17T13:36:06Z

src/rapids_singlecell/preprocessing/_hvg.py

@@ -260,6 +266,21 @@ def in_bounds(
        )


+def _hvg_expm1(X):


what does that have to do with HVG? isn’t it just expm1?

also seems like it could be cleaner with singledispatch

singledispatch doesnt work with cupy

src/rapids_singlecell/preprocessing/_hvg.py

flying-sheep · 2024-12-17T13:56:52Z

src/rapids_singlecell/preprocessing/_kernels/_qc_kernels_dask.py

+            int n_cells) {
+        int cell = blockDim.x * blockIdx.x + threadIdx.x;
+        if(cell >= n_cells){
+            return;


same as in the other PR: no error handling? is this expected to be called with invalid inputs?

This is how cuda works

flying-sheep · 2024-12-17T14:05:54Z

src/rapids_singlecell/preprocessing/_normalize.py

    if isinstance(X, cp.ndarray):
        X = cp.log1p(X)
-    else:
+    elif sparse.issparse(X):
        X = X.log1p()
-
+    elif isinstance(X, DaskArray):
+        if isinstance(X._meta, cp.ndarray):
+            X = X.map_blocks(lambda x: cp.log1p(x), meta=_meta_dense(X.dtype))
+        elif isinstance(X._meta, sparse.csr_matrix):
+            X = X.map_blocks(lambda x: x.log1p(), meta=_meta_sparse(X.dtype))


this should become a helper like the expm1 above

flying-sheep · 2024-12-17T14:08:14Z

src/rapids_singlecell/preprocessing/_qc.py

+    import dask
+    import dask.array as da
+
+    if isinstance(X._meta, sparse.csr_matrix):


please figure out a way to reuse code here.

Co-authored-by: Philipp A. <[email protected]>

for more information, see https://pre-commit.ci

add first functions

17df571

Intron7 marked this pull request as draft April 25, 2024 13:08

add hvg part1

40167ca

Intron7 changed the title ~~add first functions~~ Multi-GPU support with dask Apr 30, 2024

Intron7 and others added 10 commits April 30, 2024 12:01

Merge branch 'main' into dask_mg_support

f4db387

Merge branch 'main' into dask_mg_support

6526b42

[pre-commit.ci] auto fixes from pre-commit.com hooks

0cdb85d

for more information, see https://pre-commit.ci

reset to main for hvg

48b68f6

add support for hvg

886cafa

first pass pca

d7bf01e

pca update

b216890

fix bug with csc matrix

cdffd33

add dask to docs

177afa1

add tests

dd1377c

Intron7 added the run-gpu-ci runs GPU CI label May 3, 2024

Intron7 and others added 14 commits May 3, 2024 13:50

update names

e254800

get docs to work

77b3c34

remove client from sparse calc

36bebf9

need dask for docs

82cc22c

Merge branch 'main' into dask_mg_support

7ddde9b

add scale

e33821f

int64 updates

e1e6c19

For main branch

7da41e0

Merge branch 'main' into dask_mg_support

e676dbe

test docs

b6f436f

Merge branch 'main' into dask_mg_support

ef00052

[pre-commit.ci] auto fixes from pre-commit.com hooks

4b22562

for more information, see https://pre-commit.ci

fix import

5ed8e68

fix rebase

b879ea4

Intron7 marked this pull request as ready for review May 13, 2024 14:27

update docs

7366200

Intron7 and others added 2 commits November 15, 2024 12:14

docs update

d1a6344

Merge branch 'main' into dask_mg_support

fb8c825

Intron7 and others added 3 commits November 25, 2024 11:49

make sure dtype is correct PCA

c65585d

Merge branch 'main' into dask_mg_support

b7974f9

[pre-commit.ci] auto fixes from pre-commit.com hooks

e7a1118

for more information, see https://pre-commit.ci

Intron7 added the run-gpu-ci runs GPU CI label Dec 5, 2024

github-actions bot removed the run-gpu-ci runs GPU CI label Dec 5, 2024

Intron7 and others added 5 commits December 6, 2024 14:17

Merge branch 'main' into dask_mg_support

a2107ff

[pre-commit.ci] auto fixes from pre-commit.com hooks

03e601a

for more information, see https://pre-commit.ci

Merge branch 'main' into dask_mg_support

19e3602

add update

e6f3c0d

Merge branch 'main' into dask_mg_support

3cbd56c

Intron7 added the run-gpu-ci runs GPU CI label Dec 16, 2024

github-actions bot removed the run-gpu-ci runs GPU CI label Dec 16, 2024

flying-sheep requested changes Dec 17, 2024

View reviewed changes

Intron7 and others added 3 commits December 17, 2024 16:16

Update src/rapids_singlecell/preprocessing/_hvg.py

14268c5

Co-authored-by: Philipp A. <[email protected]>

add log1p wraper

0704cac

fix updating var with hvg multibatch

64551aa

Intron7 added the run-gpu-ci runs GPU CI label Dec 17, 2024

github-actions bot removed the run-gpu-ci runs GPU CI label Dec 17, 2024

Intron7 added 2 commits December 19, 2024 15:39

Merge branch 'main' into dask_mg_support

859ac24

Merge branch 'main' into dask_mg_support

689d0b9

Intron7 added the run-gpu-ci runs GPU CI label Dec 19, 2024

[pre-commit.ci] auto fixes from pre-commit.com hooks

2489dce

for more information, see https://pre-commit.ci

github-actions bot removed the run-gpu-ci runs GPU CI label Dec 19, 2024

Intron7 merged commit 4629c05 into main Dec 19, 2024
9 checks passed

Intron7 deleted the dask_mg_support branch December 19, 2024 15:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-GPU support with dask #179

Multi-GPU support with dask #179

Intron7 commented Apr 25, 2024 •

edited

Loading

ilan-gold commented Nov 14, 2024

Intron7 commented Nov 21, 2024

flying-sheep left a comment •

edited

Loading

flying-sheep Dec 17, 2024

Intron7 Dec 17, 2024

This comment was marked as duplicate.

flying-sheep Dec 17, 2024

Intron7 Dec 17, 2024

flying-sheep Dec 17, 2024

flying-sheep Dec 17, 2024

Multi-GPU support with dask #179

Multi-GPU support with dask #179

Conversation

Intron7 commented Apr 25, 2024 • edited Loading

ilan-gold commented Nov 14, 2024

Intron7 commented Nov 21, 2024

flying-sheep left a comment • edited Loading

Choose a reason for hiding this comment

flying-sheep Dec 17, 2024

Choose a reason for hiding this comment

Intron7 Dec 17, 2024

Choose a reason for hiding this comment

This comment was marked as duplicate.

flying-sheep Dec 17, 2024

Choose a reason for hiding this comment

Intron7 Dec 17, 2024

Choose a reason for hiding this comment

flying-sheep Dec 17, 2024

Choose a reason for hiding this comment

flying-sheep Dec 17, 2024

Choose a reason for hiding this comment

Intron7 commented Apr 25, 2024 •

edited

Loading

flying-sheep left a comment •

edited

Loading