Skip to content

Commit

Permalink
Merge pull request #119 from TimmyLiu/master
Browse files Browse the repository at this point in the history
merge develop branch into master branch. bump the version number to 2.6
  • Loading branch information
guacamoleo committed Jul 1, 2015
2 parents a6b3f9d + 3f032e7 commit 9731ea2
Show file tree
Hide file tree
Showing 134 changed files with 48,875 additions and 1,264 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,6 @@

# Generated kernel template files
*.clT

# flags.txt file
*flags.txt
22 changes: 20 additions & 2 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,16 +5,34 @@ compiler:

before_install:
- sudo apt-get update -qq
- sudo apt-get install -qq fglrx opencl-headers libboost-program-options-dev
- sudo apt-get install -qq fglrx libboost-program-options-dev
# Uncomment below to help verify the installs above work
# - ls -la /usr/lib/libboost*
# - ls -la /usr/include/boost

before_script:
- cd ${TRAVIS_BUILD_DIR}
# download OpenCL 1.2 header files since Travis CI only provides 1.1
- mkdir -p OpenCLInclude/CL
- cd OpenCLInclude/CL
#- wget -r --no-parent -nH --cut-dirs=4 --reject="index.html*" https://www.khronos.org/registry/cl/api/1.2/
- wget https://www.khronos.org/registry/cl/api/1.2/cl.h
- wget https://www.khronos.org/registry/cl/api/1.2/cl.hpp
- wget https://www.khronos.org/registry/cl/api/1.2/cl_d3d10.h
- wget https://www.khronos.org/registry/cl/api/1.2/cl_d3d11.h
- wget https://www.khronos.org/registry/cl/api/1.2/cl_dx9_media_sharing.h
- wget https://www.khronos.org/registry/cl/api/1.2/cl_egl.h
- wget https://www.khronos.org/registry/cl/api/1.2/cl_ext.h
- wget https://www.khronos.org/registry/cl/api/1.2/cl_gl.h
- wget https://www.khronos.org/registry/cl/api/1.2/cl_gl_ext.h
- wget https://www.khronos.org/registry/cl/api/1.2/cl_platform.h
- wget https://www.khronos.org/registry/cl/api/1.2/opencl.h
- ls
- pwd
- cd ../..
- mkdir -p bin/clBLAS
- cd bin/clBLAS
- cmake -DCMAKE_BUILD_TYPE=Release -DBUILD_TEST=OFF -DBUILD_CLIENT=ON -DCMAKE_INSTALL_PREFIX:PATH=$PWD/package ../../src
- cmake -DCMAKE_BUILD_TYPE=Release -DBUILD_TEST=OFF -DBUILD_CLIENT=ON -DOPENCL_INCLUDE_DIRS:PATH=$PWD/../../OpenCLInclude -DCMAKE_INSTALL_PREFIX:PATH=$PWD/package ../../src

script:
- make install
Expand Down
25 changes: 0 additions & 25 deletions LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -175,28 +175,3 @@
of your accepting any such warranty or additional liability.

END OF TERMS AND CONDITIONS

APPENDIX: How to apply the Apache License to your work.

To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.

Copyright [yyyy] [name of copyright owner]

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
65 changes: 34 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,24 @@ library does generate and enqueue optimized OpenCL kernels, relieving
the user from the task of writing, optimizing and maintaining kernel
code themselves.

## clBLAS update notes 04/2015
- A subset of GEMM and TRSM can be off-line compiled for Hawaii, Bonaire and Tahiti device at compile-time. This feature
eliminates the overhead of calling clBuildProgram() at run-time.
- Off-line compilation can be done with OpenCL 1.1, OpenCL 1.2 and OpenCl 2.0 runtime. However, for better
performance OpenCL 2.0 is recommended. Library user can select "OCL_VERSION" from CMake to ensure the library with
OpenCL version. It is library user's responsibility to ensure compatible hardware and driver.
- Added flags_public.txt file that contains OpenCL compiler flags used by off-line compilation. The flags_public.txt
will only be loaded when OCL_VERSION is 2.0.
- User can off-line compile one or more supported device by selecting
OCL_OFFLINE_BUILD_BONAIRE_KERNEL
OCL_OFFLINE_BUILD_HAWII_KERNEL
OCL_OFFLINE_BUILD_TAHITI_KERNEL.
However, compile for more than one device at a time might result in running out of heap memory. Thus, compile for
one device at a time is recommended.
- User may also supply specific OpenCL compiler path with OCL_COMPILER_DIR or the library will load default OpenCL compiler.
- The minimum driver requirement for off-line compilation is 14.502.


## clBLAS library user documentation

[Library and API documentation][] for developers is available online as
Expand Down Expand Up @@ -48,15 +66,12 @@ how to contribute code to this open source project. The code in the
be made against the /develop branch.

## License

The source for clBLAS is licensed under the [Apache License, Version
2.0][]
The source for clBLAS is licensed under the [Apache License, Version 2.0]( http://www.apache.org/licenses/LICENSE-2.0 )

## Example
The simple example below shows how to use clBLAS to compute an OpenCL accelerated SGEMM

The simple example below shows how to use clBLAS to compute an OpenCL
accelerated SGEMM

```c
#include <sys/types.h>
#include <stdio.h>

Expand Down Expand Up @@ -170,42 +185,30 @@ accelerated SGEMM

return ret;
}
```
## Build dependencies

### Library for Windows

- Windows® 7/8

- Visual Studio 2010 SP1, 2012

- An OpenCL SDK, such as APP SDK 2.9

- Latest CMake
* Windows® 7/8
* Visual Studio 2010 SP1, 2012
* An OpenCL SDK, such as APP SDK 2.8
* Latest CMake
### Library for Linux

- GCC 4.6 and onwards

- An OpenCL SDK, such as APP SDK 2.9

- Latest CMake
* GCC 4.6 and onwards
* An OpenCL SDK, such as APP SDK 2.9
* Latest CMake
### Library for Mac OSX

- Recommended to generate Unix makefiles with cmake
* Recommended to generate Unix makefiles with cmake
### Test infrastructure

- Googletest v1.6

- ACML on windows/linux; Accelerate on Mac OSX

- Latest Boost
* Googletest v1.6
* ACML on windows/linux; Accelerate on Mac OSX
* Latest Boost
### Performance infrastructure

- Python
* Python
[Library and API documentation]: http://clmathlibraries.github.io/clBLAS/
[[email protected]]: https://groups.google.com/forum/#!forum/clmath
Expand Down
69 changes: 69 additions & 0 deletions doc/README-BinaryCacheOnDisk.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
S. Chauveau
CAPS Entreprise
clBLAS Project
------------------------------
April 30,2014


The implementation of a binary cache for CL programs can be found in
files src/include/binary_lookup.h and src/library/blas/generic/binary_lookup.cc

The cache is currently disabled by default. It can be enabled by
setting the environment variable 'CLBLAS_CACHE_PATH' to the directory
containing the cache entries.

In the code itself, accesses to the cache are controlled by the
BinaryLookup class. A typical cache query looks as follow:

(1) Create a local instance of BinaryLookup

(2) Specify the additional characteristics (i.e. variants) of the
requested program. That information combined with the program name
and the OpenCL context and device shall form a unique signature
for the binary program.

(3) Perform the effective search by calling the 'found' method

(4a) If the search was successful then cl_program can be retrieved
by a call to the 'getProgram' method

(4b) If the search was not successful then a cl_program
must be created and populated in the cache by a call
to the 'setProgram' method.

(5) Destroy the BinaryLookup local instance.


So in practice a typical query shall looks as follow:

cl_program program ;

// The program name is part of the signature and shall be unique
const char * program_name = "... my unique program name ... " ;

BinaryLookup bl(context, device, program_name);

// Specify some additional information used to build a
// unique signature for that cache entry

bl.variantInt( vectorSize );
bl.variantInt( hasBorder );
...

// Perform the query
if ( bl.found() )
{
// Success! use the cl_program retrieved from the cache
program = bl.getProgram();
}
else
{
// Failure! we need to build the program
program = build_my_program(context,device,vectorSize,...) ;
// and inform the lookup object of the program
bl.setProgram(program);
// and finally populate the cache
bl.populateCache()
}

// The BinaryLookup shall now be destroyed
100 changes: 100 additions & 0 deletions doc/README-FunctorConcepts.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
S. Chauveau
CAPS Entreprise
April 30, 2014

The Functor concept was introduced in clBLAS to simplify the creation
of specialized versions for dedicated architectures.

The original system, referred as the 'Solver' system in this document,
is very centralized and not flexible enough to insert customized kernels.

The Functor
===========

A functor is simply a C++ object that provides an implementation of
a function. In the current case, that function is one of the BLAS calls
implemented in OpenCL.

The base class of all functors is clblasFunctor
- see src/library/blas/functor/include/functor.h
- see src/library/blas/functor/functor.cc

That class does not provide much by itself but it is supposed to be derived
once for each BLAS function to be implemented.

For instance the clblasSgemmFunctor class will be the base class of all
functors providing a generic or specific implementation of SGEMM.

A generic functor is one that is applicable to all possible arguments of the
function it implements. In most cases, there will be at least one generic
functor that will simply call the existing Solver-based implementation of the
function. For SGEMM, that is the class clblasSgemmFunctorFallback.

A specific functor is one that is applicable to only a subset of the possible
arguments of the function it implements. For instance, a SGEMM functor could
only implement it for matrices of a given block size or only for square
matrices or only for a specific device architecture (e.g. AMD Hawai) etc

The Functor Selector
====================

Multiple generic and specific functors may be available to implement each
clBLAS call. The selection of the proper functor is delegated to the class
clblasFunctorSelector whose default implementation typically returns the
fallback functors.

- see src/library/blas/functor/include/functor_selector.h
- see src/library/blas/functor/functor_selector.cc

So clblasFunctorSelector provides a large set of virtual selection methods.
Typically, a method to select a specific functor will be provided for each
supported BLAS function. Another method may be provided to select a generic
functor but that is not mandatory.

The default implementation of clblasFunctorSelector is typically that the
specific selector is redirected to the generic one returning the fallback
functor (so using the existing Solver-based implementation).


The class clblasFunctorSelector is supposed to be derived once for each
supported architecture (e.g. Hawai, Tahiti, ...) and a single global instance
of each of those derived classes shall be created. This is important because
those instances register themselves in a global data structure that is later
used to find the proper clblasFunctorSelector according to the architecture
(see clblasFunctorSelector::find() )


Functor Management & Cache
==========================

Each functor contains a reference counter that, when it reaches zero, causes
the functor destruction. See the members clblasFunctor::retain() and
clblasFunctor::release().

Of course, to be efficient, functors must be reusable between BLAS calls so
some mechanisms must be implemented to manage the functors.

Some functors, such as the fallback functors, are independent of the
arguments and of the opencl context & device. Those can typically be
implemented using a single global instance that will never be destroyed.

Other functors, such as those that manage a cl_program internally, are
dependent of the opencl context & device and sometimes of some arguments.
They need to be stored in caches using some information as keys.

In the current implementation, we propose that each functor class shall
implement its own private cache. Such functors shall not be created directly
using its constructor but via a dedicated 'provide' function (the name 'provide'
is not mandatory) that will take care of managing the internal cache.

The template class clblasFunctorCache<F> is provided as a simple
implementation of a cache of functors of type F. Use of that cache is not a
mandatory part of the functor design. Another strategies could be to keep a
single instance of the functor and implement a cache for the cl_program or to
implement a global cache shared by multiple functor classes.






Loading

0 comments on commit 9731ea2

Please sign in to comment.