-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve setting launching bounds on preset kernels #64
Conversation
tests/gpu/kernel_preset_bounds.cpp
Outdated
#include "gpu_common.h" | ||
|
||
__global__ __attribute__((annotate("jit"))) | ||
__launch_bounds__(128, 4) void kernel() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't recall the semantics. Is it always mandatory to pass the second value? If it is NOT you should add a test for such a case. Cause I am guessing this:
assert(MetadataNode->getNumOperands() == 3);
will fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Further, I would consider having the bounds as <1, 1>. To match the call site and indirectly test case in which proteus does not call the function correctly (for whatever reason).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's not mandatory. The assert
will not fail because this is the format of the metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm using different launch bounds than the actual launch dimensions because I can visually check the IR if the optimization happened. Checking will be automated in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work. Nits
- Replace metadata if already set (CUDA) - Add test kernel_preset_bounds
75ec319
to
9858438
Compare
Rebased ☝️ |
@koparasy @davidbeckingsale Given the discussion in slack on the necessity to override launch bounds for HIP, can we approve this PR? |
Closes #27