[BUG] Potentially unsafe releasing of the GIL in `DeviceBuffer` #944

shwina · 2022-01-18T20:07:04Z

The problem

When allocating memory in the constructor of DeviceBuffer, we release the GIL. See here.

After the allocation, we store a reference to the current memory resource associated with the allocation, so that the memory resource does not get deleted before the DeviceBuffer.

However, as @viclafargue pointed out in an offline discussion, when the GIL is released, another Python thread could change the current memory resource, via a call to set_current_device_resource for example. We can thus end up with a situation where the DeviceBuffer holds a reference to a different memory resource than was used for the allocation. The memory resource that was actually used for the allocation could then be deleted before the DeviceBuffer gets destructed, leading to a segfault during destruction.

Solutions

There are two potential solutions I can think of:

We don't release the GIL in the constructor of DeviceBuffer.

We still release the GIL, but we store a reference to the current memory resource first. Then, we explicitly pass that memory resource to the constructor of the underlying rmm::device_buffer. Something along these lines:

   # Save a reference to the MR and stream used for allocation
   self.mr = get_current_device_resource()
   self.stream = stream

   with nogil:
       c_ptr = <const void*>ptr

       if size == 0:
           self.c_obj.reset(new device_buffer(self.mr.c_obj.get()))
       elif c_ptr == NULL:
           self.c_obj.reset(new device_buffer(size, stream.view(), self.mr.c_obj.get()))
       else:
           self.c_obj.reset(new device_buffer(c_ptr, size, stream.view(), self.mr.c_obj.get()))

           if stream.c_is_default():
               stream.c_synchronize()

(2) seems preferred, unless I'm overlooking something subtle. Will open a PR to address.

The text was updated successfully, but these errors were encountered:

jakirkham · 2022-01-18T20:22:25Z

+1 to option 2

harrism · 2022-01-18T22:19:49Z

On the C++ side, a device_buffer takes an explicit memory resource in any of its methods that allocates memory (e.g. the constructor), and it stores it so that it uses the same MR for deallocation. In C++ we don't call get_current_device_resource in device_buffer, for exactly this reason.

Is there a way you could make it explicit in Python as well, rather than using get_current_device_resource? If the user wants to use get_current_device_resource, they could call it themselves and pass the returned resource to the device buffer.

jakirkham · 2022-01-18T22:48:37Z

I think it makes sense to have the option of specifying a resource to use when allocating a DeviceBuffer (we don't have that currently). Possibly worth doing at the same time.

That said, there is still value in having a default case that falls back to get_current_device_resource. Given DeviceBuffer is used all over the place, changing how it gets called would be too disruptive.

vyasr · 2022-01-18T22:51:26Z

I agree, making the memory resource an optional constructor parameter would be useful. I don't think we should get rid of the default behavior though, at least not in the short term.

Option 2 also seems good to me.

harrism · 2022-01-19T01:50:14Z

Can we make it look like the C++ API (the default value of the optional parameter is a call to get_current_device_resource)? That way it's very clear from the interface what the default behavior is.

rmm/include/rmm/device_buffer.hpp

Lines 130 to 133 in 5584a0c

    
           device_buffer(void const* source_data, 
        
                         std::size_t size, 
        
                         cuda_stream_view stream, 
        
                         mr::device_memory_resource* mr = mr::get_current_device_resource())

jakirkham · 2022-01-19T03:50:18Z

We should be able to put it in the same position as we have tried to follow that signature thus far.

While I understand the suggestion to include get_current_device_resource in the function signature and agree it makes sense conceptually, in Python unfortunately this can result in bad behavior as it is only evaluated once on import. So the typical way to handle this in Python would be to assign None to this argument, then check if it is None and replace that inside the function.

That said, we can clarify this behavior further in the docstring.

github-actions · 2022-02-18T05:00:53Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

harrism · 2022-03-21T20:24:47Z

@shwina should we bump to next release?

shwina · 2022-03-21T20:25:44Z

Just did so - thanks!

github-actions · 2022-04-20T21:01:02Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions · 2022-07-19T22:01:00Z

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

wence- · 2024-04-04T08:16:29Z

This was closed by #1514

shwina added bug Something isn't working ? - Needs Triage Need team to review and classify labels Jan 18, 2022

shwina added Python Related to RMM Python API and removed ? - Needs Triage Need team to review and classify labels Jan 18, 2022

shwina self-assigned this Jan 18, 2022

shwina mentioned this issue Jan 20, 2022

Enable constructing DeviceBuffers using a non-default memory resource and correctly manage memory resource lifetime in DeviceBuffer #953

Closed

github-actions bot added the inactive-30d label Feb 18, 2022

github-actions bot removed the inactive-30d label Mar 21, 2022

github-actions bot added the inactive-30d label Apr 20, 2022

github-actions bot added the inactive-90d label Jul 19, 2022

jarmak-nv added this to RMM Project Board Nov 15, 2022

harrism moved this to In Progress in RMM Project Board Jul 20, 2023

wence- closed this as completed Apr 4, 2024

github-project-automation bot moved this from In Progress to Done in RMM Project Board Apr 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Potentially unsafe releasing of the GIL in `DeviceBuffer` #944

[BUG] Potentially unsafe releasing of the GIL in `DeviceBuffer` #944

shwina commented Jan 18, 2022 •

edited

Loading

jakirkham commented Jan 18, 2022

harrism commented Jan 18, 2022

jakirkham commented Jan 18, 2022

vyasr commented Jan 18, 2022 •

edited

Loading

harrism commented Jan 19, 2022 •

edited

Loading

jakirkham commented Jan 19, 2022 •

edited

Loading

github-actions bot commented Feb 18, 2022

harrism commented Mar 21, 2022

shwina commented Mar 21, 2022

github-actions bot commented Apr 20, 2022

github-actions bot commented Jul 19, 2022

wence- commented Apr 4, 2024

[BUG] Potentially unsafe releasing of the GIL in DeviceBuffer #944

[BUG] Potentially unsafe releasing of the GIL in DeviceBuffer #944

Comments

shwina commented Jan 18, 2022 • edited Loading

The problem

Solutions

jakirkham commented Jan 18, 2022

harrism commented Jan 18, 2022

jakirkham commented Jan 18, 2022

vyasr commented Jan 18, 2022 • edited Loading

harrism commented Jan 19, 2022 • edited Loading

jakirkham commented Jan 19, 2022 • edited Loading

github-actions bot commented Feb 18, 2022

harrism commented Mar 21, 2022

shwina commented Mar 21, 2022

github-actions bot commented Apr 20, 2022

github-actions bot commented Jul 19, 2022

wence- commented Apr 4, 2024

[BUG] Potentially unsafe releasing of the GIL in `DeviceBuffer` #944

[BUG] Potentially unsafe releasing of the GIL in `DeviceBuffer` #944

shwina commented Jan 18, 2022 •

edited

Loading

vyasr commented Jan 18, 2022 •

edited

Loading

harrism commented Jan 19, 2022 •

edited

Loading

jakirkham commented Jan 19, 2022 •

edited

Loading