-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compiler: Unified Memory Allocator #2023
base: master
Are you sure you want to change the base?
Changes from 1 commit
0befd4e
db87362
ef1f368
ca806b3
50cd534
539254c
ddb5991
d337ac8
6511b06
ce12f56
3ce03ba
f4231e2
c4444a1
f3f90c1
41838ae
241e444
e724ffb
9379b31
7814a46
6df7a06
76dcdb1
6ad6611
92ba35c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -88,7 +88,7 @@ def alloc(self, shape, dtype): | |
buf = ctypes.cast(c_pointer, ctypes.POINTER(ctype_1d)).contents | ||
pointer = np.frombuffer(buf, dtype=dtype) | ||
else: | ||
pointer = np.empty(shape = (0), dtype=dtype) | ||
pointer = np.empty(shape=(0), dtype=dtype) | ||
# pointer.reshape should not be used here because it may introduce a copy | ||
# From https://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html: | ||
# It is not always possible to change the shape of an array without copying the | ||
|
@@ -343,12 +343,12 @@ def initialize(cls): | |
try: | ||
from mpi4py import MPI | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why reimport and not import it from |
||
cls.MPI = MPI | ||
cls._set_device_for_mpi() | ||
cls._set_device_for_mpi() | ||
except: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. except |
||
cls.MPI = None | ||
except: | ||
cls.lib = None | ||
|
||
@classmethod | ||
def _initialize_shared_memory(cls): | ||
cls._mempool = cls.lib.cuda.MemoryPool(cls.lib.cuda.malloc_managed) | ||
|
@@ -358,8 +358,8 @@ def _initialize_shared_memory(cls): | |
def _set_device_for_mpi(cls): | ||
if cls.MPI.Is_initialized(): | ||
n_gpu = cls.lib.cuda.runtime.getDeviceCount() | ||
rank_local = cls.MPI.COMM_WORLD.Split_type(cls.MPI.COMM_TYPE_SHARED).Get_rank() | ||
cls.lib.cuda.runtime.setDevice(rank_local % n_gpu) | ||
rank_l = cls.MPI.COMM_WORLD.Split_type(cls.MPI.COMM_TYPE_SHARED).Get_rank() | ||
cls.lib.cuda.runtime.setDevice(rank_l % n_gpu) | ||
|
||
def _alloc_C_libcall(self, size, ctype): | ||
if not self.available(): | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,7 +3,8 @@ | |
import cupy as cp | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Need to be added somehow to the test requirements and this step should be decoratred with a |
||
|
||
from devito import (Grid, Function, TimeFunction, SparseTimeFunction, Dimension, # noqa | ||
Eq, Operator, ALLOC_GUARD, ALLOC_FLAT, ALLOC_CUPY, configuration, switchconfig) | ||
Eq, Operator, ALLOC_GUARD, ALLOC_FLAT, ALLOC_CUPY, | ||
configuration, switchconfig) | ||
from devito.data import LEFT, RIGHT, Decomposition, loc_data_idx, convert_index | ||
from devito.tools import as_tuple | ||
from devito.types import Scalar | ||
|
@@ -1483,13 +1484,13 @@ def test_uma_allocation(self): | |
nt = 5 | ||
grid = Grid(shape=(4, 4, 4)) | ||
|
||
u = Function(name='u', grid=grid, allocator=ALLOC_CUPY ) | ||
u = Function(name='u', grid=grid, allocator=ALLOC_CUPY) | ||
u.data[:] = 5 | ||
address = u.data.ctypes.data | ||
pointerAttr = cp.cuda.runtime.pointerGetAttributes(address) | ||
assert pointerAttr.devicePointer == pointerAttr.hostPointer | ||
|
||
v = TimeFunction(name='v', grid=grid, save=nt, allocator=ALLOC_CUPY ) | ||
v = TimeFunction(name='v', grid=grid, save=nt, allocator=ALLOC_CUPY) | ||
v.data[:] = 5 | ||
address = v.data.ctypes.data | ||
pointerAttr = cp.cuda.runtime.pointerGetAttributes(address) | ||
|
@@ -1501,7 +1502,7 @@ def test_external_allocator(self): | |
numpy_array = np.ones(shape, dtype=np.float32) | ||
g = Grid(shape) | ||
f = Function(name='f', space_order=space_order, grid=g, | ||
allocator=ExternalAllocator(numpy_array), initializer=lambda x: None) | ||
allocator=ExternalAllocator(numpy_array), initializer=lambda x: None) | ||
|
||
# Ensure the two arrays have the same value | ||
assert(np.array_equal(f.data, numpy_array)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can we end up here? the
c_pointer is None
case is already aboveThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During the execution in MPI, domain splitting can generate a situation where the allocated data size is zero, as we have observed with Sparse Functions. When this occurs, Cupy returns a pointer with a value of zero. This conditional statement was defined for this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment noting this, until some better solution is around?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will push it, George.