-
Notifications
You must be signed in to change notification settings - Fork 11
Benchmarks
Clava comes with an extensible set of pre-packaged benchmark suites, and allows you to easily integrate your own. The advantages of doing this are plenty, as they allow you to apply code transformations programmatically (i.e., one benchmark at a time), while also allowing you to automatically compile and execute those benchmarks at will. This is opposed to the usual Clava workflow, which applies code transformations indiscriminately to every input file, while leaving the burden of compilation and execution to the developer.
There are two ways of including a benchmark suite, depending on their origin:
-
To use the built-in benchmarks: to use one of the built-in benchmark suites (available in this repository), all you need to do is to include their URL in the "External dependencies" prompt in the Clava options tab. For instance, to include the NAS suite, you add
https://github.com/specs-feup/clava-benchmarks.git?folder=NAS
to your external dependencies. -
To use your own benchmarks: to use your own benchmark suites, you should create a folder structure similar to the built-in benchmarks, and further create your own JavaScript files with the required logic (e.g., to manage input files/options). You can add it to Clava by including your local benchmark folder in the "Includes Folder" prompt in the Clava options tab.
After you include your benchmark suite, you can use it in the following way. This is just an example of basic usage, and you can check the documentation of the BenchmarkSet
and BenchmarkInstance
classes for more information.
Compilation requires you to have CMake installed on your system, as well as a C/C++ compiler.
// include the benchmark suite(s) you want to use
laraImport("lara.benchmark.RosettaBenchmarkSet");
laraImport("weaver.Query");
function main() {
//create a BenchmarkSet object
const benches = new RosettaBenchmarkSet();
//choose the individual benchmark within the set
benches.setBenchmarks(["3d-rendering", "digit-recognition", "face-detection"]);
//choose the input size(s) to be used during execution
benches.setInputSizes(["N"]);
//go through each benchmark (objects of type BenchmarkInstance)
for (var bench of benches) {
// loads the benchmark into Clava's AST
bench.load();
// now everything is ready for you to do your analysis and transformations
// in this example, we're just printing the name of every function
var funNames = [];
for (var elem of Query.search("function")) {
if (elem.isImplementation) {
funNames.push(elem.name);
}
}
println(funNames.join(","));
// now, we prepare for compilation
// if you are on Windows, you may wish to choose MinGW instead of the default (MSVC), but
// this is entirely dependent on your system. Check Clava's CMake docs for more info
bench.getCMaker().setGenerator("MinGW Makefiles");
// now we compile the benchmark
bench.compile();
// and finally, we execute it
bench.execute();
}
}
main()
In this section, we describe each of our built-in benchmark suites in terms of their possible input sizes, programming language and purpose.
CHStone is a set of kernels used to evaluate High-level Synthesis applications, as well as to generate some IP components, such as arithmetic operators. Include with https://github.com/specs-feup/clava-benchmarks.git?folder=CHStone
.
Benchmark | Language | Input options | Description |
---|---|---|---|
adpcm | C | N | An adaptive differential pulse-code modulation decoder/encoder |
aes | C | N | An Advanced Encryption Standard (AES) decoder/encoder |
blowfish | C | N | A blowfish encoder/decoder (blowfish is a symmetric-key block cipher) |
dfadd | C | N | A double-precision adder implementation for generating an hardware IP |
dfdiv | C | N | A double-precision divider implementation for generating an hardware IP |
dfmul | C | N | A double-precision multiplier implementation for generating an hardware IP |
dfsin | C | N | A double-precision sine function implementation for generating an hardware IP |
gsm | C | N | A linear predictive coding analyser of a global system for mobile communication |
jpeg | C | N | A JPEG image decompressor |
mips | C | N | A simplified simulator of a MIPS CPU |
motion | C | N | A motion vector decoder for the MPEG-2 video format |
sha | C | N | An implementation of the SHA (secure hashing algorithm) to produce an hash code of an input |
HiFlipVX is an object detection library aimed at FPGAs. Considering it is a library, we focus only on the example application it provides, which makes use of most of the library's functionality in order to manipulate an input image. Include with https://github.com/specs-feup/clava-benchmarks.git?folder=HiFlipVX
.
Benchmark | Language | Input options | Description |
---|---|---|---|
v2 | C++ | N | Object detection library for FPGAs |
LSU is a set of large real-world applications distributed as a single file. Include with https://github.com/specs-feup/clava-benchmarks.git?folder=LSU
.
Benchmark | Language | Input options | Description |
---|---|---|---|
bzip2 | C | SMALL, LARGE | Lossless compression tool |
gzip | C | SMALL, LARGE | Lossless compression tool |
oggend | C | SMALL | Encoding tool for Ogg Vorbis, a lossy audio compressing scheme |
gcc | C | SMALL, LARGE | GNU C compiler |
MachSuite is a set of 19 benchmarks designed to mimic low-level kernels suitable for hardware acceleration. Info below is directly adapted from the official docs. The names of some benchmarks were modified so that they can all exist at the same directory level (e.g., the sort
folder has two subfolders with the benchmarks merge
and radix
; these are flattened into sort-merge
and sort-radix
). Include with https://github.com/specs-feup/clava-benchmarks.git?folder=MachSuite
.
Benchmark | Language | Input options | Description |
---|---|---|---|
aes | C | D | The Advanced Encryption Standard, a common block cipher. |
backprop | C | D | A simple method for training neural networks. |
bfs-bulk | C | D | Data-oriented version of breadth-first search. |
bfs-queue | C | D | The “expanding-horizon” version of breadth-first search. |
fft-strided | C | D | Recursive formulation of the Fast Fourier Transform. |
fft-transpose | C | D | A two-level FFT optimized for a small, fixed-size butterfly. |
gemm-blocked | C | D | Naive, O(n3) algorithm for dense matrix multiplication. |
gemm-ncubed | C | D | A blocked version of matrix multiplication, with better locality. |
kmp | C | D | The Knuth-Morris-Pratt string matching algorithm. |
md-grid | C | D | n-body molecular dynamics, using k-nearest neighbors to compute only local forces. |
md-knn | C | D | n-body molecular dynamics, using spatial decomposition to compute only local forces. |
nw | C | D | A dynamic programming algorithm for optimal sequence alignment. |
sort-merge | C | D | The mergesort algorithm, on an integer array. |
sort-radix | C | D | Sorts an integer array by comparing 4-bits blocks at a time. |
spmv-crs | C | D | Sparse matrix-vector multiplication, using variable-length neighbor lists. |
spmv-ellpack | C | D | Sparse matrix-vector multiplication, using fixed-size neighbor lists. |
stencil-2d | C | D | A two-dimensional stencil computation, using a 9-point square stencil. |
stencil-3d | C | D | A three-dimensional stencil computation, using a 7-point von Neumann stencil. |
viterbi | C | D | A dynamic programing method for computing probabilities on a Hidden Markov model. |
NAS is a set of benchmarks used to evaluate the parallel performance of supercomputers. Include with https://github.com/specs-feup/clava-benchmarks.git?folder=NAS
.
Benchmark | Language | Input options | Description |
---|---|---|---|
BT | C | S, W, A, B, C, D, E | Block Tri-diagonal solver (application) |
CG | C | S, W, A, B, C | Conjugate Gradient, irregular memory access and communication (kernel) |
EP | C | S, W, A, B, C, D, E | Embarrassingly Parallel (kernel) |
FT | C | S, W, A, B, C, D, E | discrete 3D fast Fourier Transform, all-to-all communication (kernel) |
IS | C | S, W, A, B, C, D | Integer Sort, random memory access (kernel) |
LU | C | S, W, A, B, C, D, E | Lower-Upper Gauss-Seidel solver (application) |
MG | C | S, W, A, B, C, D, E | Multi-Grid on a sequence of meshes, long- and short-distance communication, memory intensive (kernel) |
SP | C | S, W, A, B, C, D, E | Scalar Penta-diagonal solver (application) |
UA | C | S, W, A, B, C, D | Unstructured Adaptive mesh, dynamic and irregular memory access (application) |
Parboil is a suite of complex applications from several fields, such as bioinformatics, physics and mathematics. Include with https://github.com/specs-feup/clava-benchmarks.git?folder=Parboil
.
Benchmark | Language | Input options | Description |
---|---|---|---|
bfs | C++ | 1M, NY, SF, UT | A breadth-first search algorithm operating over a graph |
cutcp | C++ | large, small | Computes the short-range component of Coulombic potential at each grid point over a 3D grid |
histo | C++ | large, default | Calculates a histogram of 255 bins using data with a Gaussian distribution |
lbm | C++ | long, short | A fluid dynamics simulation using the Lattice-Boltzmann Method |
mri-gridding | C++ | small | Converts points from an MR scan into a grid through interpolation, and the applies a Fast Fourier Transform over the grid |
mri-q | C++ | large, small | MRI calibration matrix using image reconstruction algorithms in non-Cartesian space |
sad | C++ | large, default | An implementation of a sum of absolute differences kernel, used in MPEG and H.264 decoders |
sgemm | C++ | medium, small | General purpose dense matrix-matrix multiplication |
spmv | C++ | large, medium, small | Calculates the product of a sparse matrix into a dense vector |
stencil | C++ | default, small | An iterative Jacobi stencil operation over a 3D grid |
tpacf | C++ | large, medium, small | Implementation of a Two Point Angular Correlation Function, used for the statistical analysis of the spatial distribution of astronomical bodies |
PolyBench is a set of 30 kernels with static control flow (i.e., no branching). Include with https://github.com/specs-feup/clava-benchmarks.git?folder=Polybench
.
Benchmark | Language | Input options | Description |
---|---|---|---|
2mm | C | MINI, SMALL, MEDIUM, LARGE, EXTRALARGE | 2 Matrix Multiplications (alpha * A * B * C + beta * D) |
3mm | C | MINI, SMALL, MEDIUM, LARGE, EXTRALARGE | 3 Matrix Multiplications ((AB)(C*D)) |
adi | C | MINI, SMALL, MEDIUM, LARGE, EXTRALARGE | Alternating Direction Implicit solver |
atax | C | MINI, SMALL, MEDIUM, LARGE, EXTRALARGE | Matrix Transpose and Vector Multiplication |
bicg | C | MINI, SMALL, MEDIUM, LARGE, EXTRALARGE | BiCG Sub Kernel of BiCGStab Linear Solver |
cholesky | C | MINI, SMALL, MEDIUM, LARGE, EXTRALARGE | Cholesky Decomposition |
correlation | C | MINI, SMALL, MEDIUM, LARGE, EXTRALARGE | Correlation Computation |
covariance | C | MINI, SMALL, MEDIUM, LARGE, EXTRALARGE | Covariance Computation |
deriche | C | MINI, SMALL, MEDIUM, LARGE, EXTRALARGE | Edge detection filter |
doitgen | C | MINI, SMALL, MEDIUM, LARGE, EXTRALARGE | Multi-resolution analysis kernel (MADNESS) |
durbin | C | MINI, SMALL, MEDIUM, LARGE, EXTRALARGE | Toeplitz system solver |
fdtd-2d | C | MINI, SMALL, MEDIUM, LARGE, EXTRALARGE | 2-D Finite Different Time Domain Kernel |
gemm | C | MINI, SMALL, MEDIUM, LARGE, EXTRALARGE | Matrix-multiply C=alpha.A.B+beta.C |
gemver | C | MINI, SMALL, MEDIUM, LARGE, EXTRALARGE | Vector Multiplication and Matrix Addition |
gesummv | C | MINI, SMALL, MEDIUM, LARGE, EXTRALARGE | Scalar, Vector and Matrix Multiplication |
gramschmidt | C | MINI, SMALL, MEDIUM, LARGE, EXTRALARGE | Gram-Schmidt decomposition |
head-3d | C | MINI, SMALL, MEDIUM, LARGE, EXTRALARGE | Heat equation over 3D data domain |
jacobi-1D | C | MINI, SMALL, MEDIUM, LARGE, EXTRALARGE | 1-D Jacobi stencil computation |
jacobi-2D | C | MINI, SMALL, MEDIUM, LARGE, EXTRALARGE | 2-D Jacobi stencil computation |
lu | C | MINI, SMALL, MEDIUM, LARGE, EXTRALARGE | LU decomposition |
ludcmp | C | MINI, SMALL, MEDIUM, LARGE, EXTRALARGE | LU decomposition followed by Forward Substitution |
mvt | C | MINI, SMALL, MEDIUM, LARGE, EXTRALARGE | Matrix Vector Product and Transpose |
nussinov | C | MINI, SMALL, MEDIUM, LARGE, EXTRALARGE | Dynamic programming algorithm for sequence alignment |
seidel | C | MINI, SMALL, MEDIUM, LARGE, EXTRALARGE | 2-D Seidel stencil computation |
symm | C | MINI, SMALL, MEDIUM, LARGE, EXTRALARGE | Symmetric matrix-multiply |
syr2k | C | MINI, SMALL, MEDIUM, LARGE, EXTRALARGE | Symmetric rank-2k update |
syrk | C | MINI, SMALL, MEDIUM, LARGE, EXTRALARGE | Symmetric rank-k update |
trisolv | C | MINI, SMALL, MEDIUM, LARGE, EXTRALARGE | Triangular solver |
trmm | C | MINI, SMALL, MEDIUM, LARGE, EXTRALARGE | Triangular matrix-multiply |
Rosetta is a set of complex image processing and machine learning applications used to evaluate FPGA optimizations. We present a CPU-friendly version in our distribution. Include with https://github.com/specs-feup/clava-benchmarks.git?folder=Rosetta
.
Benchmark | Language | Input options | Description |
---|---|---|---|
3d-rendering | C++ | N | A 3D software renderer |
digit-recognition | C++ | N | A digit recognition application based on a K-nearest neighbours classifier |
face-detection | C++ | N | A face detection application based on the Viola-Jones algorithm |
optical-flow | C++ | current, sintel | An application that calculates the optical flow (i.e., motion vectors) between image frames |
spam-filter | C++ | N | A Logistic Regression model trained with Stochastic Gradient Descent (SGD) |