-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Potentially unsafe releasing of the GIL in DeviceBuffer
#944
Comments
+1 to option 2 |
On the C++ side, a device_buffer takes an explicit memory resource in any of its methods that allocates memory (e.g. the constructor), and it stores it so that it uses the same MR for deallocation. In C++ we don't call Is there a way you could make it explicit in Python as well, rather than using get_current_device_resource? If the user wants to use |
I think it makes sense to have the option of specifying a resource to use when allocating a That said, there is still value in having a default case that falls back to |
I agree, making the memory resource an optional constructor parameter would be useful. I don't think we should get rid of the default behavior though, at least not in the short term. Option 2 also seems good to me. |
Can we make it look like the C++ API (the default value of the optional parameter is a call to rmm/include/rmm/device_buffer.hpp Lines 130 to 133 in 5584a0c
|
We should be able to put it in the same position as we have tried to follow that signature thus far. While I understand the suggestion to include That said, we can clarify this behavior further in the docstring. |
This issue has been labeled |
@shwina should we bump to next release? |
Just did so - thanks! |
This issue has been labeled |
This issue has been labeled |
This was closed by #1514 |
The problem
When allocating memory in the constructor of
DeviceBuffer
, we release the GIL. See here.After the allocation, we store a reference to the current memory resource associated with the allocation, so that the memory resource does not get deleted before the DeviceBuffer.
However, as @viclafargue pointed out in an offline discussion, when the GIL is released, another Python thread could change the current memory resource, via a call to
set_current_device_resource
for example. We can thus end up with a situation where the DeviceBuffer holds a reference to a different memory resource than was used for the allocation. The memory resource that was actually used for the allocation could then be deleted before the DeviceBuffer gets destructed, leading to a segfault during destruction.Solutions
There are two potential solutions I can think of:
We don't release the GIL in the constructor of
DeviceBuffer
.We still release the GIL, but we store a reference to the current memory resource first. Then, we explicitly pass that memory resource to the constructor of the underlying
rmm::device_buffer
. Something along these lines:(2) seems preferred, unless I'm overlooking something subtle. Will open a PR to address.
The text was updated successfully, but these errors were encountered: