Version 0.3.2
benvanwerkhoven
released this
04 Nov 19:56
·
1245 commits
to master
since this release
Version 0.3.2
This version adds several new and recent features. Most importantly is the new feature to specify user-defined metrics for Kernel Tuner to compute along with the benchmarking results. User-defined metrics are composable, so you can define metrics that build upon other metrics. The documentation pages have also been updated to include this new feature and other recent changes.
An important change that might influence benchmark results reported by Kernel Tuner is the fact that the runner will now do a warm up of the device using the first kernel in the parameter space. This is to remove any startup or cold start delays that were significantly slowing down the first benchmarked kernel on many devices.
From the changelog:
[0.3.2] - 2020-11-04
Added
- support loop unrolling using params that start with loop_unroll_factor
- always insert "define kernel_tuner 1" to allow preprocessor ifdef kernel_tuner
- support for user-defined metrics
- support for choosing the optimization starting point x0 for most strategies
Changed
- more compact output is printed to the terminal
- sequential runner runs first kernel in the parameter space to warm up device
- updated tutorials to demonstrate use of user-defined metrics