You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is an issue in boost.compute but i do not see it solved any time soon. Anyways, we would need for that version of boost.compute to be adopted on a broader scale to rely on it.
The issue is that a kernel lookup is slow, ~1ms, even if the program is already compiled and cached. compute only caches the programs, not the kernel. Thus, we have to cache the kernels as well - this is not a problem as most programs only have one kernel. so we just have to instantiate the kernel after the program is generated and cache both together.
The text was updated successfully, but these errors were encountered:
we could maybe get around all that caching by using
static thread_local kernel k = create_my_kernel(...);
this would create the kernel exactly once per thread. I am not sure whether generating a lot of kernels might lead to problems in driver implementation. I have not found any information on that. but it is unlikely that there would be a hard limit.
This is an issue in boost.compute but i do not see it solved any time soon. Anyways, we would need for that version of boost.compute to be adopted on a broader scale to rely on it.
The issue is that a kernel lookup is slow, ~1ms, even if the program is already compiled and cached. compute only caches the programs, not the kernel. Thus, we have to cache the kernels as well - this is not a problem as most programs only have one kernel. so we just have to instantiate the kernel after the program is generated and cache both together.
The text was updated successfully, but these errors were encountered: