gpu: kernel lookup is slow #13

Ulfgard · 2018-05-28T10:03:15Z

This is an issue in boost.compute but i do not see it solved any time soon. Anyways, we would need for that version of boost.compute to be adopted on a broader scale to rely on it.

The issue is that a kernel lookup is slow, ~1ms, even if the program is already compiled and cached. compute only caches the programs, not the kernel. Thus, we have to cache the kernels as well - this is not a problem as most programs only have one kernel. so we just have to instantiate the kernel after the program is generated and cache both together.

Ulfgard · 2018-05-29T09:48:52Z

we could maybe get around all that caching by using

static thread_local kernel k = create_my_kernel(...);

this would create the kernel exactly once per thread. I am not sure whether generating a lot of kernels might lead to problems in driver implementation. I have not found any information on that. but it is unlikely that there would be a hard limit.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gpu: kernel lookup is slow #13

gpu: kernel lookup is slow #13

Ulfgard commented May 28, 2018

Ulfgard commented May 29, 2018

gpu: kernel lookup is slow #13

gpu: kernel lookup is slow #13

Comments

Ulfgard commented May 28, 2018

Ulfgard commented May 29, 2018