-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add API to create queue in device memory #284
base: amd-staging
Are you sure you want to change the base?
Conversation
FYI @benvanik this is still WIP, but let me know if this works for you and if the cacheable flag works/has benefits. |
/AzurePipelines run |
Azure Pipelines successfully started running 1 pipeline(s). |
ea66d58
to
9971e7b
Compare
76a3db2
to
fe3307c
Compare
@@ -1272,7 +1272,7 @@ void BlitKernel::PopulateQueue(uint64_t index, uint64_t code_handle, void* args, | |||
std::atomic_thread_fence(std::memory_order_acquire); | |||
queue_buffer[index & queue_bitmask_] = packet; | |||
std::atomic_thread_fence(std::memory_order_release); | |||
if (core::Runtime::runtime_singleton_->flag().dev_mem_queue() && !queue_->needsPcieOrdering()) { | |||
if (queue_->IsDeviceMem() && !queue_->needsPcieOrdering()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@saleelk I'm curious if the logic is correct for the original needsPcieOrdering() method you added. Shouldn't this really be:
queue_->needsPcieOrdering()
Meaning we need to change the logic of that call internally?
fe3307c
to
6bbccb0
Compare
6bbccb0
to
436b64a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Thank you!
1825907
to
51dc622
Compare
This builds on a prior change that allowed for allocating a user-mode queue's packet buffer in device memory to also allocate the queue struct in device memory. This provides additional latency benefits particularly for cases where dispatches are performed from the GPU itself. An AMD extension API for queue creation is added to specify if the queue should be created in device memory and whether or not it should be cacheable. This is for use by higher level libraries when doing queue creation. This provides the added benefit of allowing device memory to be used on a per-queue basis.
51dc622
to
4533c8d
Compare
* The queue packet buffer and the queue struct should be allocated in | ||
* the agent's device memory. | ||
*/ | ||
HSA_AMD_QUEUE_FLAG_DEVICE_MEM = (1 << 0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should explicitly mention whether its cached on uncached. Like HSA_AMD_QUEUE_HOST = 0, HSA_AMD_QUEUE_DEV_UNCACHED = 1 << 0,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the change loosk good except for the flags I mentioned and if we should expose them, and thanks for fixing the typo I had.
* Used to indicate if the queue created in device memory should be | ||
* cacheable. | ||
*/ | ||
HSA_AMD_QUEUE_FLAG_CACHEABLE = (1 << 1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably discuss if this is even doable at the moment, to flush L2 is costly and if we are writing a packet (aka 64bytes), we can probably also have false sharing issues/
Adds an API to create a user-mode queue in device memory.
Now the queue struct is also allocated in device memory.
A flag is added to the API to specify whether or not the memory should be cacheable.