You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Riallto uses a mix of compilers to build applications for NPU devices. For the C++ VLIW vector-processor code, it uses a compiler called Chess, and for configuring data movement, it uses MLIR and an MLIR dialect called MLIR-AIE.
This blog will explore how we can inspect the generated MLIR-AIE code when constructing an NPU application with Riallto and explain some of the elements of the MLIR-AIE code. It aims to provide a slightly deeper dive into aspects of the Riallto compilation toolchain that the notebooks. For a higher-level overview, please refer to this notebook in the Riallto repository.
Riallto MLIR-AIE codegen
Let's say we have the following callgraph that defines a pipeline where a single passthrough kernel is passed each row of a 720p RGBA image:
The callgraph above provides a high-level description of our pipeline. However, this description needs to be mapped spatially and temporally onto the NPU array to construct an application. The spatial mapping determines which kernels are communicating with each other and the connections between them. We have directly connected the interface tile to the threshold kernel in this case. The temporal mapping determines how data is partitioned and flows through the pipeline. We are splitting the image up row by row and passing each row into the pipeline. To perform this mapping, Riallto uses a tracer, where a behavioural execution of the pipeline is performed and traced to determine which kernels are communicating and the sequence of how the data flows through the pipeline.
Our AppBuilder class is callable, and when we provide it with some initial numpy arrays for inputs and outputs, it calls the application tracer. We can execute the tracer with the following:
Tracing the application produces metadata detailing the communication patterns of the kernels and how data moves through them. It's possible to view this metadata in a Jupyter Notebook with the following:
app.metadata
Riallto uses the metadata to codegen MLIR-AIE. The MLIR-AIE code describes more concretely the mapping of the application pipeline onto the NPU array and how data flows through the pipeline.
To view the MLIR-AIE code generated from the above example, run the following command in a Jupyter Notebook after the tracer has completed:
app_builder.displaymlir()
The above command should show the MLIR-AIE code below. Remember, this is machine-generated, low-level code; it might look overwhelming. However, in the following section, we will discuss portions to explain what is happening and give you a feel for how this describes the application. For a more detailed description of the MLIR-AIE operations in use here, you should examine the MLIR-AIE documentation.
Highlighting some aspects of the generated MLIR-AIE code
We will now highlight some aspects of the MLIR code that Riallto generated for the above example.
At the top of the generated MLIR-AIE code, we can see the following:
%tile00 = AIE.tile(0, 0)
%tile02 = AIE.tile(0, 2)
These are coordinates to tile locations within the NPU array. Our application uses the interface tile %tile00 at location (0,0) and a single compute tile %tile02 at location (0,2) for our passthrough kernel.
Communication between the tiles and kernels is handled via objectFifos (documentation), for example:
The above specifies that we want to connect the interface tile %tile00 to the tile where our passthrough kernel executes %tile02. The type and size of the transfers along this connection are specified with the memref.
Our passthrough kernel is not compiled with MLIR but with a vector VLIW compiler called Chess. However, we use MLIR-AIE to generate wrapper code around our kernel to control data moving into and out of the kernel. We call our passthrough kernel within the MLIR code and link it with the object file the Chess compiler generates. The function prototype for our passthrough kernel is specified as follows:
Where each iteration of the loop, we access the following elements of the input and output objectFifos, %elem1 & %elem2. We can also notice how the objectFifos use the hardware locking architectural features to ensure they safely consume and produce data for the buffers.
Finally, the @sequence function at the end of the MLIR-AIE code is a special function that specifies how the interface tiles communicate with host memory.
We can see that two buffer references, %itbuffer_0 and %itbuffer_1, in the sequence function arguments. These references are our input and output images on the host side.
Within the body of the sequence function, we can see two calls to an AIE.ipu.dma_memcpy_nd function; these are multi-dimensional memcpy commands that read and write to the host buffers and either push or pull data from the objectFifos connected to the compute or memory tiles. With this command, it is possible to iterate over the input and output buffers in up to 4 dimensions. Please refer to the MLIR-AIE documentation for more information on this operation here.
The final command in the sequence is a sync operation; this tells the system to wait until it has seen the last data transfer on a particular DMA channel and ensures that the system is synchronised.
Summary
In this blog, we have dived a bit deeper into the codegen side of Riallto, exploring how it can take a high-level description of the system, trace it, and use the tracking information to generate MLIR-AIE. We then explored some of the generated MLIR-AIE code to understand how this described the finally placed pipeline in the NPU array. For more information on MLIR-AIE, please refer to the MLIR-AIE documentation and MLIR-AIE tutorials.our simple pipeline we only have a single kernel, so we only have a single AIE.core block. We can also notice that at the end of the AIE.core block we have specified that we want to link the compiled code for this tile with the object file that Chess generated for our passthrough kernel.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Riallto uses a mix of compilers to build applications for NPU devices. For the C++ VLIW vector-processor code, it uses a compiler called Chess, and for configuring data movement, it uses MLIR and an MLIR dialect called MLIR-AIE.
This blog will explore how we can inspect the generated MLIR-AIE code when constructing an NPU application with Riallto and explain some of the elements of the MLIR-AIE code. It aims to provide a slightly deeper dive into aspects of the Riallto compilation toolchain that the notebooks. For a higher-level overview, please refer to this notebook in the Riallto repository.
Riallto MLIR-AIE codegen
Let's say we have the following callgraph that defines a pipeline where a single passthrough kernel is passed each row of a 720p RGBA image:
In the callgraph method we can see that:
img_in
into thepassthrough
kernel.passthrough
into each row of the output image,img_out
.The passthrough kernel itself is just a simple
memcpy
copying the input data to the output:The callgraph above provides a high-level description of our pipeline. However, this description needs to be mapped spatially and temporally onto the NPU array to construct an application. The spatial mapping determines which kernels are communicating with each other and the connections between them. We have directly connected the interface tile to the threshold kernel in this case. The temporal mapping determines how data is partitioned and flows through the pipeline. We are splitting the image up row by row and passing each row into the pipeline. To perform this mapping, Riallto uses a tracer, where a behavioural execution of the pipeline is performed and traced to determine which kernels are communicating and the sequence of how the data flows through the pipeline.
Our AppBuilder class is callable, and when we provide it with some initial numpy arrays for inputs and outputs, it calls the application tracer. We can execute the tracer with the following:
Tracing the application produces metadata detailing the communication patterns of the kernels and how data moves through them. It's possible to view this metadata in a Jupyter Notebook with the following:
Riallto uses the metadata to codegen MLIR-AIE. The MLIR-AIE code describes more concretely the mapping of the application pipeline onto the NPU array and how data flows through the pipeline.
To view the MLIR-AIE code generated from the above example, run the following command in a Jupyter Notebook after the tracer has completed:
The above command should show the MLIR-AIE code below. Remember, this is machine-generated, low-level code; it might look overwhelming. However, in the following section, we will discuss portions to explain what is happening and give you a feel for how this describes the application. For a more detailed description of the MLIR-AIE operations in use here, you should examine the MLIR-AIE documentation.
Highlighting some aspects of the generated MLIR-AIE code
We will now highlight some aspects of the MLIR code that Riallto generated for the above example.
At the top of the generated MLIR-AIE code, we can see the following:
These are coordinates to tile locations within the NPU array. Our application uses the interface tile
%tile00
at location(0,0)
and a single compute tile%tile02
at location(0,2)
for our passthrough kernel.Communication between the tiles and kernels is handled via
objectFifos
(documentation), for example:The above specifies that we want to connect the interface tile
%tile00
to the tile where ourpassthrough
kernel executes%tile02
. The type and size of the transfers along this connection are specified with thememref
.Our
passthrough
kernel is not compiled with MLIR but with a vector VLIW compiler called Chess. However, we use MLIR-AIE to generate wrapper code around our kernel to control data moving into and out of the kernel. We call our passthrough kernel within the MLIR code and link it with the object file the Chess compiler generates. The function prototype for ourpassthrough
kernel is specified as follows:The wrapper code and function call to our passthrough kernel executing on the compute tile is specified within a
AIE.core
block:In side our
AIE.core
block, we can see that we are calling our passthrough kernel in a loop with the following:Where each iteration of the loop, we access the following elements of the input and output objectFifos,
%elem1
&%elem2
. We can also notice how the objectFifos use the hardware locking architectural features to ensure they safely consume and produce data for the buffers.Finally, the
@sequence
function at the end of the MLIR-AIE code is a special function that specifies how the interface tiles communicate with host memory.We can see that two buffer references,
%itbuffer_0
and%itbuffer_1
, in thesequence
function arguments. These references are our input and output images on the host side.Within the body of the
sequence
function, we can see two calls to anAIE.ipu.dma_memcpy_nd
function; these are multi-dimensional memcpy commands that read and write to the host buffers and either push or pull data from the objectFifos connected to the compute or memory tiles. With this command, it is possible to iterate over the input and output buffers in up to 4 dimensions. Please refer to the MLIR-AIE documentation for more information on this operation here.The final command in the sequence is a
sync
operation; this tells the system to wait until it has seen the last data transfer on a particular DMA channel and ensures that the system is synchronised.Summary
In this blog, we have dived a bit deeper into the codegen side of Riallto, exploring how it can take a high-level description of the system, trace it, and use the tracking information to generate MLIR-AIE. We then explored some of the generated MLIR-AIE code to understand how this described the finally placed pipeline in the NPU array. For more information on MLIR-AIE, please refer to the MLIR-AIE documentation and MLIR-AIE tutorials.our simple pipeline we only have a single kernel, so we only have a single
AIE.core
block. We can also notice that at the end of theAIE.core
block we have specified that we want to link the compiled code for this tile with the object file that Chess generated for ourpassthrough
kernel.Beta Was this translation helpful? Give feedback.
All reactions