You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current implementation of scatter has some limitation.
the GPU implementation hard coded iterator bindings which might not work for certain devices. For example, for OpenCL backend, if a GPU has only one dimension global work size.
for j in T.thread_binding(0, 560, thread = "blockIdx.x"):
for k in T.thread_binding(0, 560, thread = "blockIdx.y"):
for i in T.thread_binding(0, 32, thread = "threadIdx.x"):
There is no room for optimization because of hard code. Normally, we need to create schedule from IRModule and define optimization strategies.
Need to create a optimization schedule and measure its performance.
The text was updated successfully, but these errors were encountered:
The current implementation of
scatter
has some limitation.There is no room for optimization because of hard code. Normally, we need to create
schedule
fromIRModule
and define optimization strategies.Need to create a optimization schedule and measure its performance.
The text was updated successfully, but these errors were encountered: