TC is an automatic source-to-source optimizing compiler for affine loop nests, generating sequential or parallel tiled code based on the application of a transitive closure of a loop nest dependence graph, and combining the Polyhedral Model and Iteration Space Slicing frameworks. TC utilizes a state-of-the-art polyhedral compilation toolchain, that is:
- Polyhedral Extraction Tool [3] for extracting polyhedral representations of original loop nests,
- Integer Set Library [1] for performing dependence analysis, manipulating sets and relations as well as generating output code,
- Barvinok library [2] for calculating set cardinality and processing its representation.
In order to optimize a loop nest, one should be surrounded by #pragma scop
and #pragma endscop
directives:
int main()
{
int N;
int A[N+2][N+2];
#pragma scop
for (int i = 1; i <= N; ++i) {
for (int j = 1; j <= N; ++j) {
S1: A[i][j] = A[i][j+1] + A[i+1][j] + A[i+1][j-1];
}
}
#pragma endscop
}
Note: The source file containing the loop nest should be valid C code, and simplified as much as possible. Array accesses must not exceed array bounds. Since version 0.3.0, iterators of a for loop must be declared inside that for loop itself, otherwise they will create a dependency for outer loops.
TC implements a number of tiling transformation algorithms as well as schedulers and code generators (including parallel generators utilizing OpenMP), all available to choose from through command line options (full description below). One is encouraged to experiment with various combinations of algorithms, schedulers and code generators, as well as tile sizes and transitive closure algorithms.
Note: TC is primarily used for studying algorithms utilizing transitive closure. Despite being able to generate efficient tiled code, some features are still in development.
For the example loop nest, the correction technique can be applied with a tile size of 32x32:
/* TC Optimizing Compiler 0.4.0 */
/* ./tc ../examples/other/correction.scop.c --correction-tiling --lex-scheduling --serial-codegen -b 32 */
#define min(x,y) ((x) < (y) ? (x) : (y))
#define max(x,y) ((x) > (y) ? (x) : (y))
#define floord(n,d) (((n)<0) ? -((-(n)+(d)-1)/(d)) : (n)/(d))
#pragma scop
for (int ii0 = 0; ii0 <= floord(N - 1, 32); ii0 += 1) {
for (int ii1 = 0; ii1 <= (N - 1) / 32; ii1 += 1) {
for (int i0 = 32 * ii0 + 1; i0 <= min(N, 32 * ii0 + 32); i0 += 1) {
for (int i1 = max(1, 32 * ii0 + 32 * ii1 - i0 + 2); i1 <= 32 * ii1; i1 += 1) {
A[i0][i1] = ((A[i0][i1 + 1] + A[i0 + 1][i1]) + A[i0 + 1][i1 - 1]);
}
if (32 * ii1 + 32 >= N) {
for (int i1 = 32 * ii1 + 1; i1 <= N; i1 += 1) {
A[i0][i1] = ((A[i0][i1 + 1] + A[i0 + 1][i1]) + A[i0 + 1][i1 - 1]);
}
} else {
for (int i1 = 32 * ii1 + 1; i1 <= 32 * ii0 + 32 * ii1 - i0 + 33; i1 += 1) {
A[i0][i1] = ((A[i0][i1 + 1] + A[i0 + 1][i1]) + A[i0 + 1][i1 - 1]);
}
}
}
}
}
#pragma endscop
The codes generated by TC for the studied kernels can be found in the results
directory of the compiler’s repository.
See CHANGELOG
to follow latest changes.
automake autoconf libtool pkg-config libgmp3-dev libclang-dev
llvm libntl-dev g++ make git clang zlib1g-dev libglpk-dev
git clone https://github.com/piotr-skotnicki/tc-optimizer.git tc
cd tc
git submodule update --init --recursive
./autogen.sh
./configure
make
tc <input.c> <algorithm> <scheduling> <codegen> [<closure>] [<options>...]
Hint: Use
source scripts/tc-completion.bash
to enable bash completions.
--stencil-tiling Concurrent start tiling for stencils
--regular-tiling Tiling with regular tile shapes
--correction-tiling Tiling with LT tiles correction
--correction-inv-tiling Tiling with GT tiles correction
--merge-tiling Tiling with tiles merging
--split-tiling Tiling with tiles splitting
--mod-correction-tiling Tiling with LT cyclic tiles modified correction
--lex-scheduling Lexicographic order execution
--isl-scheduling Integer set library scheduler
--isl-wave-scheduling Integer set library scheduler with wavefronting
--feautrier-scheduling Integer set library scheduler (Feautrier scheduling)
--sfs-single-scheduling Tiling of synchronization-free slices with single sources
--sfs-multiple-scheduling Tiling of synchronization-free slices with multiple sources
--sfs-tile-scheduling Tile-wise synchronization-free slices
--free-scheduling Free scheduling based on R^+
--free-rk-scheduling Free scheduling based on R^k
--free-finite-scheduling Exact free scheduling for finite graphs
--dynamic-free-scheduling Dynamic free scheduling
--serial-codegen Serial code generator
--omp-for-codegen OpenMP parallel for generator
--omp-task-codegen OpenMP parallel task generator
--omp-gpu-codegen OpenMP offloading to GPU target
--isl-map-tc ISL normalized map transitive closure (default)
--isl-union-map-tc ISL union map transitive closure
--floyd-warshall-tc Floyd-Warshall algorithm
--iterative-tc Iterative algorithm
--omega-map-tc Omega normalized map transitive closure
--omega-union-map-tc Omega union map transitive closure
--tarjan-tc Tarjan algorithm for finite graphs
-b <value> Tile size, e.g. -b 256 -b S1:128,128 (default: 32)
--debug | -d Verbose mode
--report Generate tile statistics report (use -R for each parameter)
--inline Always inline loop bounds expressions
-D <name>=<value> Define parameter value, e.g. -D M=2000 -D N=2600
-R <name>=<value> Set parameter value for report generation, e.g. --report -R M=2000 -R N=2600
--cache <value> Cache line length in bytes (default: 64)
--use-macros Use macro definitions in place of statements
--yes | -y Non-interactive mode
--version | -v Print compiler info
--help | -h Print help
./src/tc ./examples/stencils/heat-1d.scop.c --stencil-tiling --omp-for-codegen -b 150,25000 --debug
./src/tc ./examples/polybench/bicg.scop.c --correction-tiling --sfs-single-scheduling --omp-for-codegen -b 8
./src/tc ./examples/polybench/trisolv.scop.c --merge-tiling --free-scheduling --omp-task-codegen -b S1:16 -b S2:16,8 -b S3:16
In case of questions/problems/bugs, please contact:
Piotr Skotnicki <[email protected]>
West Pomeranian University of Technology
Faculty of Computer Science and Information Technology
ul. Zolnierska 49, 71-210 Szczecin, Poland
[1] Verdoolaege S (2010) ISL: an integer set library for the polyhedral model. In: Mathematical software--ICMS 2010, Lecture notes in computer science. vol 6327. Springer, Berlin, pp 299--302
[2] Verdoolaege S, Seghir R, Beyls K et al. Algorithmica (2007) 48: 37. https://doi.org/10.1007/s00453-006-1231-0
[3] Verdoolaege S, Grosser T (2012) Polyhedral extraction tool. In: Proceedings of the 2nd international workshop on polyhedral compilation techniques. Paris, France