diff --git a/LICENSE b/LICENSE index 9d43b8f..03b5452 100644 --- a/LICENSE +++ b/LICENSE @@ -1,6 +1,7 @@ BSD 3-Clause License Copyright (c) 2019, Technical University of Munich +Copyright (c) 2023, Intel Corporation All rights reserved. Redistribution and use in source and binary forms, with or without diff --git a/docs/tensor-ir.rst b/docs/tensor-ir.rst new file mode 100644 index 0000000..0276b8a --- /dev/null +++ b/docs/tensor-ir.rst @@ -0,0 +1,488 @@ +.. Copyright (C) 2023 Intel Corporation + SPDX-License-Identifier: BSD-3-Clause + +.. _descriptor: + +=================================== +YATeTo immediate language reference +=================================== + +This document is a draft for an immediate tensor language that sits between the high-level +Einstein notation and the low-level back-end-specific code. + +The grammar is given in `ABNF syntax `_. + +Core rules +========== + +White space is used to separate tokens, where a token is either an identifier, +a literal, a keyword, or characters such as punctuation or delimiters. +Otherwise, white space has no meaning. + +Comments start with ``;`` and stop at the end of the line (``\\n``). + +Identifier +========== + +Identifiers are either named or unnamed. +Named identifiers are letter followed by letters, underscores, or digits. +Unnamed identifiers are simply numbers. +As in LLVM, local identifiers are prefixed with ``%``, whereas global identifiers +are prefixed with ``@``. + +.. code:: abnf + + identifier = 1*DIGIT / (ALPHA *(ALPHA / DIGIT / "_")) + local-identifier = "%" identifier + global-identifier = "@" identifier + +Index notation +============== + +.. code:: abnf + + index = ALPHA + indices = 1*index / "_" + map-arity-1 = "{" indices "to" indices "}" + map-arity-2 = "{" indices "," indices "to" indices "}" + loop-index = "[" index "]" + fused-indices = "(" 1*index ")" + indices-with-mods = 1*(index / loop-index / fused-indices ) / "_" + map-arity-2-with-mods = "{" indices-with-mods "," indices-with-mods "to" indices-with-mods "}" + +Tensor operations are specified using index notation. +The indices can be arbitrarily chosen but need to consistent among operands. + +For example, a copy instruction could have the *map-arity-1* "{ji to ij}" that can +be thought of as the copy B[i,j] = A[j,i]. +Here, a transpose operation is fused inside the copy as the order of indices i,j is switched on the right-hand-side. + +The index notation can be augmented with modifiers. +For example, the tensor contraction + +.. math:: + C_{ijnk} = \sum_m A_{ikm} B_{mjn} + +is, for example, described as loop-over-GEMM with "{i[k]m, m(jn) to i(jn)[k]}". +Here, "[.]" means that an index is looped-over (not part of the GEMM) and +"(.)" means that two or more indices are treated as a single index. + + +The "_"-symbol is used to omit indices, that is, for 0-dimensional tensors. + +Constants +========= + +.. code:: abnf + + sign = "-" / "+" + integer-constant = [sign] 1*DIGIT + hexdigit = DIGIT / ALPHA + floating-constant = [sign] *DIGIT "." 1*DIGIT ["e" [sign] 1*DIGIT] + mantissa-dec = *DIGIT "." 1*DIGIT | 1*DIGIT "." + mantissa-hex = *hexdigit "." 1*hexdigit | 1*hexdigit "." + exponent = [sign] 1*DIGIT + floating-constant-dec = [sign] (mantissa-dec ["e" exponent] | 1*DIGIT "e" exponent) + floating-constant-hex = [sign] "0x" (mantissa-hex ["p" exponent] | 1*hexdigit "p" exponent) + floating-constant = floating-constant-dec | floating-constant-hex + +Integer constants must lie in the range :math:`-2^63+1,\dots,2^63-1`. + +Floating point constants are given in C syntax and expected to be in the range of double precision numbers. +The hexadecimal floating point syntax is supported, too. +`strtod `_ can be used for parsing floating +point numbers. + +Functions +========= + +.. code:: abnf + + function-definition = "define" global-identifier "(" [argument-list] ")" region + argument-list = argument *("," argument) + argument = local-identifier ":" type + +Regions +======= + +.. code:: abnf + + region = "{" *instruction "}" + +A region is an ordered list of instructions. +An instruction might contain a region. +Regions have access to values from its enclosing region, but the enclosing region does not have access to +values assigned in the region. + +Types +===== + +.. code:: abnf + + type = void-type / scalar-type / memref-type / group-type + void-type = "void" + +Scalar types +------------ + +.. code:: abnf + + scalar-type = integer-type / floating-type + integer-type = ("i" / "u") ("8" / "16" / "32" / "64") + floating-type = "f" ("32" / "64") + +Scalar types are either integer ("i"), unsigned integer ("u"), +or floating point ("f"). +The number behind the scalar type prefix denotes the number of bits, +e.g. "f64" are double precision floating point numbers. + +Memref type +----------- + +.. code:: abnf + + memref-type = "memref<" tensor-shape ["," memory-layout] ">" + tensor-shape = scalar-type *("x" integer-constant) + +A memref points to a region of memory that stores a tensor. +The underlying scalar type and the tensor shape is given by the ``tensor-shape`` rule. + +The tensor can have order 0. E.g. ``memref`` can be thought of as a pointer to a single precision float. +A vector is a tensor of order 1, e.g. ``memref``. +A matrix is a tensor of order 2, e.g. ``memref``. +A tensor of order n is given by ``memref``. + + +The default memory layout is the packed dense layout. +E.g. the memory layout of ``memref`` is ``strided<1,5,30>``. +We note that ``memref`` and ``memref>`` +are the same type. + + +.. admonition:: Discussion + + - Do we need a tensor value type? + +Memory layout +............. + +.. code:: abnf + + memory-layout = strided-layout + +Strided layout +~~~~~~~~~~~~~~ + +.. code:: abnf + + strided-layout = "strided<" [integer-list] ">" + integer-list = integer-constant *("," integer-constant) + +The strided layout is a sequence of integers :math:`S_1,S_2,...,S_n`, where *n* must be equal +to the order of the tensor. +The strided layout is defined as the map + +.. math:: + + (i_1,i_2,...,i_n) \mapsto i_1 S_1 + i_2 S_2 + ... + i_n S_n + +We further impose the following restriction for a tensor with shape :math:`s_1\times s_2 \times ... \times s_n`: + +* :math:`1 \leq S_1` +* :math:`\forall i \in [2,n]: S_{i-1}s_{i-1} \leq S_i` + +Therefore, we have the "column-major" layout. +The default packed dense layout is given by + +* :math:`1 = S_1` +* :math:`\forall i \in [2,n]: S_{i-1}s_{i-1} = S_i` + +Group type +---------- + +.. code:: abnf + + group-type = "group<" memref-type "," group-layout ">" + group-layout = distance-layout / pointer-layout + distance-layout = "distance<" integer-constant ">" + pointers-layout = "pointers" + +The group type describes a group of memrefs. +The group is either given in a single memory region with a fixed +distance between items (distance layout) or a pointer to each item is given (pointers layout). + +.. admonition:: Discussion + + - Instead of ``group<..., distance<...>>`` one could use tensors with dynamic size. + E.g. instead of ``group, distance<4>>`` one might use + ``memref``. That would be nice from a conceptual point of view but then + we would need do deal with tensors with potentially unknown size in every instruction. + +Instructions +============ + +.. code:: abnf + + instruction = value-instruction + / axpby-instruction + / barrier-instruction + / lifetime-stop-instruction + / log-instruction + / for-instruction + / product-instruction + / sum-instruction + value-instruction = local-identifier "=" (alloca-instruction / get-work-item-instruction / subview-instruction) + +Alloca +------ + +.. code:: abnf + + alloca-instruction = "alloca" ":" memref-type + +Overview +........ + +The alloca instruction allocates temporary memory that is freed automatically at the end of the block that contains the alloca. + +Arguments +......... + +The argument is the type of the returned value. + +Get work item +------------- + +.. code:: abnf + + get-work-item-instruction = "get_work_item" local-identifier ["," integer-type local-identifier] ":" group-type + +Overview +........ + +Get work item fetches an item from a batch. + +Arguments +......... + +The first operand must have the batch type. +The optional second operand must be an integer scalar type and is used to specify +an offset. + +Subview +------- + +.. code:: abnf + + subview-instruction = "subview" local-identifier "[" [index-or-slice-list] "]" ":" memref-type "to" memref-type + index-or-slice-list = index-or-slice *("," index-or-slice) + index-or-slice = integer-type local-identifier | integer-constant | slice + slice = [integer-constant] ":" [integer-constant] + +Overview +........ + +The subview instruction returns a view on a tensor. + +Arguments +......... + +The local identifier must have the left-hand memref type and the instruction returns the right-hand memref type. +Slices are given as [to:from), i.e. to is included and from is excluded. + + +Axpby +----- + +.. code:: abnf + + axpby-instruction = "axpby" map-arity-1 "," floating-constant "," local-identifier "," local-identifier ":" memref-type "to" memref-type + +Overview +........ + +Axpby implements + +.. math:: + + B[\pi_B(I)] := \alpha A[\pi_A(I)] + \beta B[\pi_B(I)] + +Arguments +......... + +The first argument gives the index map that defines the indices I +as well as the permutation :math:`\pi_A, \pi_B`. +Note that the input and output indices in the index map must be equal +up to permutation. +The second argument gives :math:`\alpha`. +The third and the fourth argument must have memref type and give A and B, respectively. +The number of indices must be equal to the order of A and B. + +Loop-over-GEMM +-------------- + +.. code:: abnf + + log-instruction = "log" map-arity-2-with-mods "," floating-constant "," local-identifier "," local-identifier "," floating-constant "," local-identifier ":" memref-type "," memref-type "to" memref-type + +Overview +........ + +Loop-over-GEMM implements the well-known GEMM BLAS-3 operation +wrapped in loops. + +Arguments +......... + +The loop-over-GEMM operation implements + +.. math:: + + C[\pi_C(I_m\cup I_n)] := \alpha \sum_{I_k} + A[\pi_A(I_m \cup I_k)] B[\pi_B(I_k \cup I_n)] + + \beta C[\pi_C(I_m\cup I_n)] + +The permuations and index sets are given by the index map (first argument). +The index map defines the three sets :math:`I_A, I_B, I_C` and we have + +.. math:: + + I_{common} = I_A \cap I_B \cap I_C + + I_m = I_A \cap I_C \setminus I_{common} + + I_n = I_B \cap I_C \setminus I_{common} + + I_k = I_A \cap I_B \setminus I_{common} + +.. admonition:: Todo + + Specify modifiers. + +The second argument gives :math:`\alpha` and the fifth argument gives :math:`\beta`. +The third, the fourth, and the sixth argument must have memref type and give +A, B, and C, respectively. + +For +--- + +.. code:: abnf + + for-instruction = "for" integer-type local_identifier "=" integer-constant "to" integer-constant region + +Overview +........ + +It's a for loop. + +The loop's range [from; to) is given by the first integer constant and second integer constant. +The trip count is stored in the local identifier. + +Product +------- + +.. code:: abnf + + product-instruction = "product" map-arity-2 "," floating-constant "," local-identifier "," local-identifier "," floating-constant "," local-identifier ":" memref-type "," memref-type "to" memref-type + +Overview +........ + +Product multiplies two tensors without reduction (sum over index). + +Arguments +......... + +The product operation implements + +.. math:: + + C[\pi_C(I_C)] := \alpha + A[\pi_A(I_A)] B[\pi_B(I_B)] + + \beta C[\pi_C(I_C)] + +The permuations and index sets are given by the index map (first argument). +The index map defines the three sets :math:`I_A, I_B, I_C` and it +is required that + +.. math:: + + I_C = I_A \cup I_B + +The second argument gives :math:`\alpha` and the fifth argument gives :math:`\beta`. +The third, the fourth, and the sixth argument must have memref type and give +A, B, and C, respectively. + +Sum +--- + +.. code:: abnf + + sum-instruction = "sum" map-arity-1 "," floating-constant "," local-identifier "," floating-constant "," local-identifier ":" memref-type "to" memref-type + +Overview +........ + +Sum over indices. + +Arguments +......... + +The sum operation implements + +.. math:: + + B[\pi_B(I_B)] := \alpha \sum_{I_s} + A[\pi_A(I_A)] + \beta B[\pi_B(I_B)] + +The permuations and index sets are given by the index map (first argument). +The index map defines the two sets :math:`I_A, I_B` and we require + +.. math:: + + I_B \subset I_A + + I_{s} = I_A \setminus I_B + +The second argument gives :math:`\alpha` and the fourth argument gives :math:`\beta`. +The third and the fifth argument must have memref type and give +A and B, respectively. + + +Additional instructions +----------------------- + +.. code:: abnf + + barrier-instruction = "barrier" + lifetime-stop-instruction = "lifetime_stop" local-identifier + +Sample code +=========== + +The following sample implements the kernel + +.. math:: + + D := 5 A B C + D \text{ with } + A \in \mathbb{R}^{16\times 8}, + B \in \mathbb{R}^{8\times 8}, + C \in \mathbb{R}^{8\times 16}, + D \in \mathbb{R}^{16\times 16} + +where B and C are constant matrices and A and D are matrix batches. + +.. code:: + + define @fused_kernel(%A: group,pointers>, + %B: memref, + %C: memref, + %D: group,distance<256>>) { + %0 = get_work_item %A : group,pointers> + %1 = get_work_item %D : group,distance<256>> + %tmp0 = alloca : memref + log {ik, kj to ij} 1.0, %0, %B, 0.0, %tmp0 + : memref, memref to memref + log {ik, kj to ij} 5.0, %tmp0, %C, 1.0, %1 + : memref, memref to memref + }