You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When compiling Grid and running with a single GPU, running e.g. Benchmark_ITT gives the error:
accelerator_barrier(): Cuda error invalid configuration argument
Digging into this, this is due to line 137 of Grid/threads/Accelerator.h
dim3 cu_blocks ((num1+nt-1)/nt,num2,1); \
For reasons I haven't dug deep enough to understand, when running with 1 GPU, then (num1+nt-1)/nt (or in the specific case that fails—called from WilsonKernelsImplementation.h—(sz+nt-1)/nt) gets set to zero, which isn't a valid block count.
Describe the issue:
When compiling Grid and running with a single GPU, running e.g.
Benchmark_ITT
gives the error:Digging into this, this is due to line 137 of
Grid/threads/Accelerator.h
For reasons I haven't dug deep enough to understand, when running with 1 GPU, then
(num1+nt-1)/nt
(or in the specific case that fails—called fromWilsonKernelsImplementation.h
—(sz+nt-1)/nt
) gets set to zero, which isn't a valid block count.As a workaround, changing line 137 to
allows the code to run correctly.
Code example:
Target platform:
Tested on Grace Hopper Arm+H100, Leicester Arm+A100, and AMD Rome + A100 in Swansea.
Configure options:
../configure --enable-comms=none --enable-simd=GPU --enable-accelerator=cuda CXX=nvcc --disable-zmobius --disable-gparity 'CXXFLAGS=-g -gencode arch=compute_90,code=sm_90 -std=c++17 -DEIGEN_DONT_VECTORIZE'
The text was updated successfully, but these errors were encountered: