Releases: clMathLibraries/clBLAS
v2.12 rollup bugfix release
- Fixes to AutoGemm with beta == 0 and removing trtri hard coded options (@pavanky,@shehzan10)
- test-functional passes all tests
- Fixes to compile with clang 3.7 (@iotamudelta)
- Fixes to dtrsm (@shehzan10)
- gcc compiler flags refactored for non-x86 machines (@psyhtest)
- Travis and Appveyor fixes (@haaHh)
- Added TLS memory and removed global cl_kernel objects to fix threading issues
- improvements to the unit tests to increase reliability comparing floating point values
- Add opencl device selection logic for test correctness programs
- Removed the -pedantic flag from gcc builds to reduce the amount of warnings
- Support for altivec on powerpc64 P8 systems (@tfauck)
- Fixes to syr2 (@mgates3)
clBLAS-2.10.0 Release for ACL 1.0 GA
This clBLAS release is tagged as v2.10 and is part of AMD Compute Libraries (ACL) 1.0 GA. This release is based on merge from develop branch to master branch.
- AutoGemm now contains optimized parameters for Fiji GPUs with HBM (High-Bandwidth Memory) as well as optimized parameters for non-HBM devices, such as Hawaii, from release 2.8. The selection of logic can be done in CMake.
- Many bug fixes, including:
- Restore ability to use multiple different devices (not concurrently) via different contexts.
- AutoGemm works with Python 2 and 3.
- Better memory cleanup during teardown.
Thank you to the following contributors for this release: @pavanky , @shehzan10 , @hughperkins , @ghisvail , @notorca
- The release binaries are online compiled only, assuming OpenCL 2.0 compiler. The ASIC name (Hawaii or Fiji) in the binary titles indicates the kernel selection logic used to generate the binary; use the Fiji version for Fiji only (due to HBM) and use the Hawaii version for all other (non-HBM) GPUs.
clBLAS-2.8.0 Release for ACL 1.0 Beta 2
This clBLAS release is tagged as v2.8 is part of AMD Compute Libraries (ACL) 1.0 beta 2. This release is based on merge from develop branch to master branch.
The highlights of the release:
- Introduced AutoGemm, the new high-performing GEneric Matrix Matrix multiplication (GEMM) backend for clBLAS, is a suite of Python scripts which:
- generates thousands of optimized GEMM OpenCL kernels
- benchmarks these kernels for a particular GPU and different matrix sizes to determine which are the fastest
- automatically chooses the optimal kernel within clBLAS for peak performance
- allows applications with unique GEMM requirements (such as very small or very skinny matrices) to generate customized application-specific GEMM kernels for additional performance.
- Incorporated new faster DTRSM algorithm that:
- enable the use of more hardware friendly algorithm for both online and offline compilation
- leverages the DGEMM performance improvement from AutoGemm
- MISC
- fixes SGEMM performance drop at big multiples of 1024
- fixes DGEMM performance drop at big sizes (ranging from 18000 by 18000 to 36000 by 36000)
- supports Visual Studio 2015
- adds CI support of Windows and Mac OS
clBLAS-2.6.0 Release for ACL 1.0 Beta 1
This clBLAS release is tagged as v2.6 is part of AMD Compute Libraries (ACL) 1.0 beta 1. This release is based on merge from develop branch to master branch.
The highlights of the release:
- Introduced offline kernel compilation
- Improved performance (with offline kernel compilation) of
- sgemm small matrices NN, TN, NT
- sgemm large matrices NN, TN, NT
- zgemm large matrices NT for m,n,k multiples of 32,64,8 respectively
- dtrsm large matrices for m,n multiples of 192
- Incorporated some CMake configuration changes
- Released binaries now includes offline compiled library for certain device and driver.
- "clBLAS-2.6.0-Windows-x64-Hawaii-14502.zip" is a binary built for Hawaii device with 14.502 driver on Windows platform
- Binary built for Hawaii device with 14.502 driver on Linux platform will be released once the driver is released
- (update 08/06/2015) "clBLAS-2.6.0-Linux-x64-Hawaii-14502.tar.gz" is a binary build for Hawaii device with 14.502 driver on Linux platform
clBLAS-2.4.0 Release
Release based on merge from develop branch to master branch.
The highlights of the merge:
- fix correctness bug of c/zsyr2k; fix correctness bug in ktest's reference code
- improve tuning tool coverage
- allow another parent CMake project to call clBLAS as subdirectory (thanks to contributions from @robertsdionne )
- bug fix related to Intel CPU (thanks for contributions from @pavanky )
- bug fix related to Intel OpenCL driver (thanks for contributions from @pavanky )
- bug fix related to Intel SDKs on Windows, Apple SDKs on OSX (thanks for contributions from @pavanky )
- enable build static library (thanks for contributions from @glehmann )
- some installation and prefix fixes (thanks for contributions from @glehmann )
- allow user to build gtest from source (thanks for contributions from @glehmann )
New release with binaries available
The /develop branch has seen improvements and bug fixes since the source posted on github, and it was time to merge that activity into /master. The highlights of the merge:
- The code can compile and run tests on the MacOSX platform (thanks to contributions from @gicmo & @abergeron)
- Hang fixed in hemm & symm when tuning
- Client program extended with an option to use *copyBufferRect() API to copy data
- Support for vs2013 added
- Proof-of-concept wrapper for clBLAS added with sgemm
- Cmake improvements to detect and copy dependencies of targets
- A staging directory was added to the build process to ease debugging during development
and many other bug fixes. In addition, this release of clBLAS will provide binary packages for those who do not want to go through the steps of compiling the source on supported platforms. However, test dependencies are not packaged, and will need to be downloaded by the user. The clBLAS test programs depend on ACML.
The initial open source release of clBLAS
This release is the open-sourcing of the APPML clAmdBlas project. It provides the complete set of BLAS level 1, 2 and 3 routines, written with an API that exposes OpenCL objects to allow the library developer to maximize performance by controlling the OpenCL state.
The version number of the clBLAS project is starting at v2.0, to distinguish it from the closed source clAmdBlas project. All the API's have been changed to provide a vendor neutral naming scheme, and a new clBLAS.h header file has been introduced.
The original clAmdBlas.h header file has been heavily modified to provide backwards compatibility for clAmdBlas users transitioning to clBLAS. It is a 'wrapper' header around clBLAS.h and users should convert to the new header file at earliest convenience.