Releases: KernelTuner/kernel_tuner
Version 0.1.6
Version 0.1.6
Version 0.1.6 brings a few bugfixes but mostly extends the existing functionality of the tuner. Three new search strategies have been added and are now ready to use: minimize, basinhopping, and diff_evo. For more info on what these strategies do and what solvers and methods they support please see the documentation pages.
From the CHANGELOG:
[0.1.6] - 2017-08-17
Changed
- actively freeing GPU memory after tuning
- bugfix for 3D grids when using OpenCL
Added
- support for dynamic parallelism when using PyCUDA
- option to use differential evolution optimization
- global optimization strategies basinhopping, minimize
Version 0.1.5
Version 0.1.5
Version 0.1.5 brings more flexibility, you can now pass code generating functions, your own functions for verifying kernel output correctness, and use your own names for the thread block dimensions.
Internally, quite a lot has changed in this version. The runners have been separated into strategies and runners. And the way that options are passed around within the Kernel Tuner has changed dramatically.
From the CHANGELOG:
[0.1.5] - 2017-07-21
Changed
- option to pass a fraction to the sample runner
- fixed a bug in memset for OpenCL backend
Added
- parallel tuning on single node using Noodles runner
- option to pass new defaults for block dimensions
- option to pass a Python function as code generator
- option to pass custom function for output verification
Version 0.1.4
This release adds that tune_kernel will also return a dictionary containing information about the environment in which the benchmarking of the kernel was performed. This is very useful for understanding how and under what circumstances certain measurement results were obtained.
In addition, there were some very minor changes in the way C functions are compiled and called.
Version 0.1.3
Bugfixes for handling scalar arguments and documentation update.
Version 0.1.2
Better defaults for grid divisor lists, full support for 3D grids, and a simpler way to specify the problem size of 1D grids.
[0.1.2] - 2017-03-29
Changed
- allow non-tuple problem_size for 1D grids
- changed default for grid_div_y from None to block_size_y
- converted the tutorial to a Jupyter Notebook
- CUDA backend prints device in use, similar to OpenCL backend
- migrating from nosetests to pytest
- rewrote many of the examples to save results to json files
Added
- full support for 3D grids, including option for grid_div_z
- separable convolution example
Version 0.1.1
[0.1.1] - 2017-02-10
Changed
- changed the output format to list of dictionaries
Added
- option to set compiler options
version 0.1.0
Version 0.1.0
The Kernel Tuner should by now be ready for production use. Over the last few months we have used it in several projects, which has revealed some of the things that were fixed in this version. This release also marks the end of a period in which the internal structure of the Kernel Tuner has changed several times. We expect the current code structure to stay around for a while. With this version we also release the public roadmap for the project, to show which changes and additional features we have planned for the near and not so near future. We also feel that the software is now ready to be added to public software repositories, which we will do shortly.
first beta release
This is the first beta release of the Kernel Tuner.
This release basically marks the first version of the kernel tuner, which is currently in beta testing to see what functionality is missing and what needs to be fixed before the code can be considered production ready.
A brief description of the Kernel Tuner's functionality in this version:
- Basic kernel tuning functionality for CUDA, OpenCL, and C functions
- Many examples and rather extensive documentation
- Search space restriction, using the 'restrictions' option
- Kernel output verification, using the 'answer' option
- Example showing how to tune both host code (number of streams) and GPU code
- Run a single kernel with a specific parameter set and get the output