Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add API to create queue in device memory #284

Open
wants to merge 2 commits into
base: amd-staging
Choose a base branch
from

Conversation

atgutier
Copy link
Contributor

Adds an API to create a user-mode queue in device memory.

Now the queue struct is also allocated in device memory.

A flag is added to the API to specify whether or not the memory should be cacheable.

@atgutier
Copy link
Contributor Author

FYI @benvanik this is still WIP, but let me know if this works for you and if the cacheable flag works/has benefits.

@amd-jmacaran
Copy link
Contributor

/AzurePipelines run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@@ -1272,7 +1272,7 @@ void BlitKernel::PopulateQueue(uint64_t index, uint64_t code_handle, void* args,
std::atomic_thread_fence(std::memory_order_acquire);
queue_buffer[index & queue_bitmask_] = packet;
std::atomic_thread_fence(std::memory_order_release);
if (core::Runtime::runtime_singleton_->flag().dev_mem_queue() && !queue_->needsPcieOrdering()) {
if (queue_->IsDeviceMem() && !queue_->needsPcieOrdering()) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saleelk I'm curious if the logic is correct for the original needsPcieOrdering() method you added. Shouldn't this really be:

queue_->needsPcieOrdering()

Meaning we need to change the logic of that call internally?

@atgutier atgutier force-pushed the atgutier/queue-struct-dev-mem branch from fe3307c to 6bbccb0 Compare January 28, 2025 21:21
@atgutier atgutier requested a review from dayatsin-amd January 28, 2025 21:21
@atgutier atgutier force-pushed the atgutier/queue-struct-dev-mem branch from 6bbccb0 to 436b64a Compare January 28, 2025 21:23
Copy link
Collaborator

@dayatsin-amd dayatsin-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thank you!

@atgutier atgutier force-pushed the atgutier/queue-struct-dev-mem branch 8 times, most recently from 1825907 to 51dc622 Compare January 29, 2025 00:33
This builds on a prior change that allowed for allocating
a user-mode queue's packet buffer in device memory to also
allocate the queue struct in device memory. This provides
additional latency benefits particularly for cases where
dispatches are performed from the GPU itself.

An AMD extension API for queue creation is added to specify
if the queue should be created in device memory and whether
or not it should be cacheable. This is for use by higher
level libraries when doing queue creation. This provides the
added benefit of allowing device memory to be used on a
per-queue basis.
@atgutier atgutier force-pushed the atgutier/queue-struct-dev-mem branch from 51dc622 to 4533c8d Compare January 29, 2025 20:25
@atgutier atgutier requested a review from dayatsin-amd January 29, 2025 20:28
* The queue packet buffer and the queue struct should be allocated in
* the agent's device memory.
*/
HSA_AMD_QUEUE_FLAG_DEVICE_MEM = (1 << 0),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should explicitly mention whether its cached on uncached. Like HSA_AMD_QUEUE_HOST = 0, HSA_AMD_QUEUE_DEV_UNCACHED = 1 << 0,

Copy link
Contributor

@saleelk saleelk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the change loosk good except for the flags I mentioned and if we should expose them, and thanks for fixing the typo I had.

* Used to indicate if the queue created in device memory should be
* cacheable.
*/
HSA_AMD_QUEUE_FLAG_CACHEABLE = (1 << 1),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably discuss if this is even doable at the moment, to flush L2 is costly and if we are writing a packet (aka 64bytes), we can probably also have false sharing issues/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add API flag for whether an AQL queue should be allocated in device memory.
4 participants