Skip to content

Simplifying Reductions

Ryan Job edited this page Aug 12, 2024 · 3 revisions

This page needs more info. Please add it!

This document describes the "Simplifying Reductions" and related transformations. The seminal work for this is the 2006 "Simplifying Reductions" paper by Gautam and Sanjay (referred to as "GR06" and linked below).

https://dl.acm.org/doi/abs/10.1145/1111037.1111041

Mapping to Source Code

The table below maps concepts from the GR06 paper to source code.

Concept Paper Section Source Code
Context Domain 3.2.2 https://github.com/CSU-CS-Melange/alpha-language/blob/main/bundles/alpha.model/src/alpha/model/AlphaInternalStateConstructor.java
Extended Thick Face Lattice (ETFL) 3.1.2, 4.3, 4.4 https://github.com/CSU-CS-Melange/alpha-language/blob/main/bundles/alpha.model/src/alpha/model/util/FaceLattice.xtend
Face (Part of the ETFL) 3.1.2, 4.3, 4.4 https://github.com/CSU-CS-Melange/alpha-language/blob/main/bundles/alpha.model/src/alpha/model/util/Face.xtend
Share Space and Reuse Space 5.2 https://github.com/CSU-CS-Melange/alpha-language/blob/main/bundles/alpha.model/src/alpha/model/analysis/reduction/ShareSpaceAnalysis.xtend
Preprocessing / Normalization (1/3) 5.1, 6.2 https://github.com/CSU-CS-Melange/alpha-language/blob/main/bundles/alpha.model/src/alpha/model/transformation/Normalize.xtend
Preprocessing / Normalization (2/3) 5.1, 6.2 https://github.com/CSU-CS-Melange/alpha-language/blob/main/bundles/alpha.model/src/alpha/model/transformation/reduction/NormalizeReduction.xtend
Preprocessing / Normalization (2/3) 5.1, 6.2 https://github.com/CSU-CS-Melange/alpha-language/blob/main/bundles/alpha.model/src/alpha/model/transformation/reduction/NormalizeReductionDeep.xtend
Simplification Transformation 5.3, 5.4 https://github.com/CSU-CS-Melange/alpha-language/blob/main/bundles/alpha.model/src/alpha/model/transformation/reduction/SimplifyingReductions.xtend
Recursive Simplification 5.3.2, 7 https://github.com/CSU-CS-Melange/alpha-language/blob/main/bundles/alpha.model/src/alpha/model/transformation/automation/OptimalSimplifyingReductions.xtend
Idempotence 5.5.1 https://github.com/CSU-CS-Melange/alpha-language/blob/main/bundles/alpha.model/src/alpha/model/transformation/reduction/Idempotence.xtend
Higher Order Operator 5.5.2 https://github.com/CSU-CS-Melange/alpha-language/blob/main/bundles/alpha.model/src/alpha/model/transformation/reduction/HigherOrderOperator.xtend
Same Operator Simplification 6.1.1 https://github.com/CSU-CS-Melange/alpha-language/blob/main/bundles/alpha.model/src/alpha/model/transformation/reduction/SameOperatorSimplification.xtend
Distributivity (1/2) 6.1.2 https://github.com/CSU-CS-Melange/alpha-language/blob/main/bundles/alpha.model/src/alpha/model/transformation/reduction/Distributivity.xtend
Distributivity (2/2) 6.1.2 https://github.com/CSU-CS-Melange/alpha-language/blob/main/bundles/alpha.model/src/alpha/model/transformation/reduction/FactorOutFromReduction.xtend
Reduction Decomposition 6.3 https://github.com/CSU-CS-Melange/alpha-language/blob/main/bundles/alpha.model/src/alpha/model/transformation/reduction/ReductionDecomposition.xtend

Background

This section covers some background information and definitions which are useful for understanding simplification.

Reductions

A reduction is an operation which takes some set of values as inputs and produces a set of values as outputs. Each output is computed by applying an associative operator to a subset of the inputs. The operator is often commutative as well (and this is helpful for many optimizations), but that is not strictly necessary. Operations like summation, product, or minimum are reductions. The standard mathematical format for a reduction is as follows:

$$ Y[f_p(z)] = \bigoplus_{z \in D} X[f_r(z)] $$

The reduction operator ($\oplus$) is the associative operator being applied to the inputs ($X$). This is usually a standard operator, such as addition or maximum, but could theoretically be any associative operator.

An important concept in AlphaZ is the reduction body ($D$). You can think of this as the set of loop iterations required to evaluate all of the outputs ($Y$). Since AlphaZ works within the polyhedral model, we can view the reduction body as being a polyhedral set of integer points.

At each point in the reduction body ($z$), we take a single input value and accumulate it into a single answer value. The reduction's read function ($f_r$) is an affine map from a point in the reduction body to the point in the input set that we are accumulating. Similarly, the projection function ($f_p$) is an affine map from a point in the reduction body to the point in the output set that we are accumulating into.

A reduciton's accumulation space describes the set of points in the reduction body which are used to compute a single answer. This space is characterized by the kernel of the projection function ($\ker(f_p)$). Intuitively, this gives us the set of points in the reduction body that make up the answer at the origin of the answer space. Through translation, we can use this to find the set of points used to compute a different answer as well. Thus, we just use the kernel of the projection function, as it's easy to compute.

A reduction's reuse space is similar to the accumulation space, but for inputs instead of outputs. It is characterized by the kernel of the read function ($\ker(f_r)$). By the same intuition, this gives us the set of points in the reduction body that all read the input at the origin of the input space, and we can use translation to find the set of points that read a different input.

Face Lattice

A reduction body will always be a single polyhedral set of integers. The reduction body exists within some $n$-dimensional space. There are also a finite set of hyperplanes used to define the set of points within the set, each of which correspond to an inequality or equality constraint of the set.

We can think of polyhedra as having faces. A face is the intersection of the polyhedron with a (potentially empty) subset of its hyperplanes. For example, consider a cube. The cube itself can be thought of as one of its faces, although we typically ignore this as it's not particularly useful. The cube has six sides, each of which are a face (and which line up with the common definition of a face). The cube also has twelve edges and eight vertices, each of which we also consider faces of the cube (perhaps somewhat unintuitively). Finally, some definitions would also consider the empty set to be a face of the cube. However, this is also not particularly helpful, so we typically ignore it.

If our polyhedron is $n$-dimensional, the $(n-1)$-dimensional faces are of particular interest. We call these the facets of the polyhedron. In the example of our cube, since it is a 3D object, the 2D faces (i.e., the 6 sides) are the facets of the cube. None of the other faces (the cube itself, the edges, vertices, and the empty set) are considered facets (only faces).

It is useful to organize all the faces of a polyhedron into a data structure called the face lattice. This is a graph structure where each node is a face of the polyhedron. An edge is drawn between the nodes for two faces if one face is a facet of the other. The nodes are arranged in layers based on their dimensionality, with the original polyhedron at the top and the vertices (or the empty set, if included) a the bottom.

In our cube example, the top 3D layer of the face lattice contains only a single node for the cube itself. The 2D layer is below that, and has one node for each of the six sides. Since each side is a facet of the cube, there is an edge from the cube's node to the node for each side. The 1D layer has twelve nodes, one for each edge of the cube. There is an edge in the graph between a side of the cube and an edge of the cube (sorry for the overloaded term) if that cube edge intersects with the cube side. Thus, each side of the cube has four graph edges going to one of the cube edges. Finally, the 0D layer contains 8 nodes for the vertices, and a graph edge between a cube edge and the two vertices that are on said cube edge.

Algorithm Intuition

We can apply the Simplifying Reductions transformation, or "simplification", to a reduction which has "reuse". Intuitively, reuse means that points of the input space are read multiple times. We can detect reuse by checking that the reuse space (as defined above) is non-empty. At a high level, our goal is to eliminate this reuse to lower the time complexity for computing the entire reduction.

We can use geometry to intuitively understand how simplification works. First, we compute the reuse space of our reduction, then select a single vector from this space. This vector is called a reuse vector ($\rho$). Then, we take our reduction body and translate it by the reuse vector. The result of this is that we now have two overlapping copies of our reduction body, and any overlapping points read the same input. The transformation rewrites the reduction to replace all points in the overlapping region by reusing a previously computed answer (or partial answer), thus eliminating those loop iterations.

Example

Consider the reduction per the equation below. Notice that the answer $Y_i$ requires $i$ additions to be computed.

$$ Y_i = \sum_{j=0}^{i} X_{i-j} $$

The reduction body and reuse space are as follows:

$$ D = { [i,j] \mid 0 \leq j \leq i < N } $$

$$ \ker(f_r) = { [i,j] \mid i = j } $$

Below is a graphical representation of the process. Our original reduction body is the red triangle. We have selected a reuse vector of $(1,1)$, which is shown as the green arrow. If we translate the reduction body by this vector, we get the blue triangle. The loop iterations in the overlap (the purple shaded region) are redundant, so we can eliminate them. When we do so, we compute one of our answers (on the solid red line) using a previous answer and a constant number of input values. We can ignore the points on the solid blue line, as those don't relate to any of our original loop iterations.

Applying simplification this way allows us to rewrite our original equation as shown below. Notice that we now can compute each $Y_i$ with a single addition (or none, in the case of the first answer).

$$ Y_i = \begin{cases} X_0 & \text{if } i=0 \\ Y_{i-1} + X_i & \text{if } i > 0 \end{cases} $$

How to Apply Simplification

GR06 defines how to apply simplification in section 5.3. Here, we'll cover it at a somewhat higher level, giving the intuition. See the paper itself for more of the exact details.

Per GR06, let $D_E$ be our original reduction body. We are given some reuse vector $\rho_E$. The translation of our reduction body by reuse vector is $D_{E'}$. Let $f_p$ be our reduction's projection function.

First, we compute three domains: the add domain $D_{add}$, the intersect domain $D_{int}$, and the subtract domain $D_{sub}$. These are calculated as shown below. In short: these are (potentially overlapping) subsets of our answers with specific properties. Answers in the add domain require that we accumulate a non-empty subset of inputs to compute that answer. Answers in the intersect domain require that we accumulate a previously computed answer. Answers in the subtract domain require that we take away a non-empty subset of inputs when computing that answer. This requires an inverse operator to the reduction operator (e.g., subtraction if the reduction is a summation).

$$ \begin{aligned} D_{add} &= f_p (D_E - D_{E'}) \\ D_{int} &= f_p (D_E \cap D_{E'}) \\ D_{sub} &= f_p (D_{E'} - D_E) \end{aligned} $$

GR06 then splits up the answers into five partitions (some of which may be empty). These partitions are based on differences and intersections of the above three domains. Note: any time a "previously computed answer" is needed, the specific answer being referred to is determined by applying the projection function to the reuse vector. That is, for some answer $Y[z]$, its previous answer is $Y[z - f_p(\rho)]$.

  1. $D_{add} - D_{int}$:
    • These are the answers computed exclusively from inputs.
    • They are computed the same way they originally were.
  2. $D_{int} - (D_{add} \cup D_{sub})$:
    • These are the answers where a previous answer was computed from the exact same set of inputs (i.e., the two answers are identical computations).
    • They are computed by simply copying that previously computed answer.
  3. $D_{add} \cap (D_{int} - D_{sub})$:
    • These are the answers where a previous answer is computed from a strict subset of the inputs needed for this answer.
    • These answers are computed by copying the previous answer, then reducing the remaining inputs into that answer.
  4. $D_{sub} \cap (D_{int} - D_{add})$:
    • These are the answers where a previous answer is computed from a strict superset of the inputs needed for this answer.
    • These answers are computed by copying the previous answer, then taking away the extra inputs.
    • This requires an inverse to the reduction operator. If no such inverse exists (e.g., with $\min$ or $\max$) and this set of answers is non-empty, we cannot apply simplification with this reuse vector.
  5. $D_{add} \cap D_{int} \cap D_{sub}$:
    • These are the answers where we take a previous answer, add some additional inputs, and remove some unnecessasry inputs.
    • Effectively, this case merges cases 3 and 4.
    • These answers are computed by copying the previous answer, reducing the remaining inputs into that answer, then taking away the unnecessary inputs.
    • As with case 4, an inverse operator is required, so if none exists, we cannot use the given reuse vector.

For the actual rewrite rules, see the GR06 paper.

Strong and Weak Boundaries

TODO: write this section.

Notes:

  • A facet is a boundary if all points on the facet contribute to the same answer.
  • A boundary is strong if no additional inputs are needed for that answer.
  • Otherwise, if additional inputs are needed, the boundary is weak.
    • Only occurs when the accumulation space is at least 2D.
  • Using reduction decomposition, we can turn weak boundaries into strong boundaries.
    • Start with the accumulation space (i.e., points that contribute to a single answer).
    • Split it up into subsets, where each subset is a partial answer.
    • Subsets are parallel to one of the facets of the reduction body.
    • Ignore facets which are parallel to the accumulation space, as that wouldn't split it into subsets.
    • The facet chosen is any which results in the weak boundary to remove becoming a strong boundary.