-
-T
when runningbin/test
. This flag prints logs for each pass, which can make it easier to tell which pass the error is coming from. You can also track status and error message of each pass withbin/log <gendir>
. More detail in Quick Start. -
--debug
PIR flag. Turn on logging in pir. Recommend to turn off in batch jobs. -
--dot
and--vdot
PIR flag. Generate dot graph for IR visualization.dot
enables CU level visualization, andvdot
enable more detailed dot graph, which can be very slow for large graph. Both recommend to turn off in batch jobs with close to 100% CU utilization. -
--fast
quick flag to choose combination of configures that speedup compilation.
All PIR flags can be specified when running bin/test
or within $HOME/.pirconf
.
When using bin/test to run Spatial app, there will be a summary log for each of the pass described
in Quick Start in <gendir>/log
, e.g.
spatial/gen/Tst_pirTest/DotProduct_0/log/genpir.log
. bin/log
parses these logs to produce a
summary of useful informations.
Spatial produces one log file per spatial pass name 00XX-PassName.log in the same directory, e.g. 0000_Staging.log
PIR also produces a per PIR pass log file when enable logging with --debug
, and the logs are in
<gendir>/pir/logs/
.
Both spatial and PIR have visualization of the IRs. Spatial dot graph is enabled by default if
running with bin/test -b Tst
. Otherwise it can be turned on with --dot
when running Spatial with
bin/spatial
.
PIR --dot
enables the CU-level dot graph and --vdot
enables
the context- and IR-level dot graph, which can run really slowly for large apps.
After spatial compilation (genpir pass), the spatial IR is shown as a hierarchical dot graph at each
block level in <gendir>/info/
. The Top.html
is the top-level IR under main function scope in
spatial. The red ovals are Block node. Clicking on those node will navigates to scope within the
block. Generally you want to look for the Accel
block within Top.html
to show IRs within the
Accel block.
Spatial also contains an expandable controller tree diagram in the same directory called controller_tree.html
,
which shows a more concise view of the controller hierarchy.
The most useful visualization would be the Virtual Unit dataflow graph after PIR compilation in
<gendir>/pir/out/dot/global.html
Each node within the graph corresponds to a virtual unit (VU) that will be placed and routed onto a physical tile on the chip.
The color of a VU denotes the VU type:
- blue and yellow: virtual compute unit. The unit is yellow if the compute unit doesn't have any operation stage.
- cyan: memory controller interface
- green: virtual memory unit
- white: host node
Edge:
- bold: vector link
- regular: scalar link
The variable names of memories declared in spatial app will show up in VUs that contain them. DRAM variable names will show up in the memory controller interface units that load or store to them. More useful information related to source code line number will show up when hovering over the node or edge labels of the dot graph.
Numbers next to the edges are output edge ID, which can be used to associate to simulation instrumentation.
During simulation, you can turn on the instrumentation logs of individual modules for debugging. Depends on how many modules are turned on for logging, logging can substantially slow down simulation.
To turn on logging of a specific module, add logon <module name>
right before log2files
in
<gendir>/tungsten/script
. If remove log2files
, all logs will print to STDOUT.
Each module will be log to an individual file in <gendir>/tungsten/logs/<module name>.log
.
<module name>
can use wildcard matching. e.g. logon go*
will turn on logging for all modules
with name starting with go
, which are virtual unit outputs shown in the global.html
.
For example the edge with ID 7109
in global.html
graph corresponds to go7109.log
in the
generated logs.
If no module name is provided, all modules will be logged. However, there's a limit on how many file
descriptors can be opened at the same time. So logging too many modules might result in some modules
not showing up in the log file. logon
without module name is not recommended.
After simulation, another file <gendir>/tungsten/logs/state.json
stores information about the
simulation as well as end states of each module.
Here are some of the performance counters for some of the modules:
- active: number of cycle the module was active
- inactive: number of cycle the module was inactive
- valid: number of cycle the buffer contains element
- ready: number of cycle the buffer was not full
- stall: number of cycle the buffer was full
- starve: number of cycle the buffer was empty
- nelem: number elements in the buffer after simulation
- cont_inactive: number of cycles the module continuously inactive right before the end of the simulation. Used to determine deadlock.
You can run python3 bin/annotate.py
in <gendir>/tungsten
that will backward annotate some of
these module states to the virtual unit dataflow graph in <gendir>/pir/out/dot/gsim.html
.
--format=cycle
will mark each edge with with a number in [#]
, where the number indicates number of elements sent on the network link. --format=pct
will annotate the activation, starve, and stall rate of the link as a percentage of runtime. The default format for the graph is cycle
if the program deadlocks, and pct
if the program succeeded.
If your app finished without deadlock, the graph will look like
The edge is annotated with
- a: percentage of the runtime the link is active
- !v: percentage of the runtime the link is not valid/starving
- !r: percentage of the runtime the link is not ready/stalling
If most of the links are running at a low active rate, the app is DRAM-bound (in this example), imbalanced, or lack of retiming.
-
You can tell weather the app is DRAM-bound by looking at achieved DRAM bandwidth printed during simulation, or in
<gendir>/log/runp2p.log
.bin/log
also parses this number. Compare this number with theoretical maximum of the memory technology, specified with--mem-tech
. -
If your app has link with large
!r
value, that link is stalling a lot which bottlenecks the performance. You need to add retiming along that path to improve runtime. -
If your app is imbalanced, find the VU with highest output link activation rate, and parallelize around this VU. Hover over the VU to find the corresponding controller source code line number.
If your app finished with deadlock, the graph will look like
The edge are marked with edge IDs, and the number under the edge ID within the bracket are number of cycle the edge was active. The color of the label means:
- green: the edge as sent number of elements expected by static analysis of the compiler.
- yellow: the edge is stalling
- red: the edge is starving
The expected number of elements sent on a link analyzed by the compiler is shown as count=<#>
when
hovering over the link label.
If count is not printed, it means the expected number of elements cannot be statically analyzed either because the producer is under a controller with data-dependent iterations or a streaming stateless controller. If the count cannot be statically analyzed, the edge will be in red even if all expected elements were sent over the link.
To debug deadlocking, starting from draining point of the app, which are edges that go to host node, traverse the graph backward and find the first node that have both red and yellow output links. The yellow output links are likely be the issue.