-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix out of bounds memory accesses in RAJAPerf suite #89
Comments
Could be related to BlockDim, GridDim specialization in the runtime and caching. Try with the latest |
This issue is not fixed by #87 |
I am suggesting the following steps:
if it doesn't fail start adding optimizations. 1 by 1. We somehow corrupt either the dynamic information or the module itself. |
I was able to verify that enabling |
I think when this is fixed it could be nice to add an integration test CI pipeline that does the following
|
Can you please verify once more that you are using/linking with the correct proteus version. Can you double check that the cache hash includes block and grid dimensions: |
I'm using the latest commit (54dbb1f) as a submodule. I will check the cache hash, but this bug exists with and without cache enabled |
Disabling |
I can replicate the issue(s). I think I have a fix for in bugfix/specialize-dims. I need though to have a clear head to check how we name things. I will do that tomorrow. I will add some tests as well. |
I went through bugfix/specialize-dims. It looks like it's a simple misnaming bug. @johnbowen42 Does this branch fix your issues? |
This fixes a subset of these issues but I still am seeing many benchmarks fail |
Hmm, that means there's another underlying issue. My advice is to run with |
|
Can you fill me in? Here or in slack? |
Update:
|
Multiple benchmarks including (
DEL_DOT_VEC_2D
,EDGE3D
,VOL3D
,PRESSURE
) are segfaulting with a variation ofCallback: Queue 0x15553ca00000 aborting with error : HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION: The agent attempted to access memory beyond the largest legal address. code: 0x29
I'm still identifying all the failing benchmarks, triaging, and working on a fix
The text was updated successfully, but these errors were encountered: