Skip to content

Running ANACIN X

Nick B edited this page Oct 20, 2021 · 4 revisions

ANACIN-X must be run alongside another application. There are 2 categories of this, and we will describe each.

  • Firstly, for demonstration purposes, ANACIN-X comes packaged with 3 non-deterministic benchmark applications to analyze. These benchmark applications are accompanied by automation scripts to simplify the process of running ANACIN-X with them. See the section below for options. See here for instructions on using this.
  • Secondly, for running ANACIN-X with an external application (e.g., a user defined communication pattern such as MiniAMR or MCB Grid), we will describe the procedure for running the ANACIN-X framework. See here for instructions on using this.

Running ANACIN-X with A Benchmark Application

Use the 'comm_pattern_analysis.sh' script to generate traces of a selected benchmark communication pattern and perform analysis on the event graphs.

Important: Make sure that the system you're running on supports the inputs you provide from the options below. If you request that the system use more processes or nodes than are available, or if you select a different scheduler from what is available, the program will fail.

Note: If you come across any errors while running your code, make sure that your version of the code is up to date using git commands like 'git pull'.

The following command line switches can be used to define parameters for your job submission:

  • -p : Defines the size of the mpi communicator (number of MPI processes) used when generating communication patterns. (Default 4 MPI processes)
  • -i : Defines the number of times a given communication pattern appears in a single execution of ANACIN-X. If running the message race communication patter, it's recommended to set this to at least 10. (Default 1 iteration)
  • -s : The size in bytes of the messages passed when generating communication patterns. (Default 512 bytes)
  • -n : The number of compute nodes requested for running the ANACIN-X workflow. If you're running on an unscheduled system, this value should be set to 1. (Default 1 node)
  • -r : The number of runs to make of the ANACIN-X workflow. Be sure that this is set to more than 1. Otherwise, analysis will not work. (Default 2 executions)
  • -cp : Used to define the communication pattern benchmark for testing. Must be one of the 3 provided benchmarks in the following format: message_race, amg2013, or unstructured_mesh.
  • -sc : Used to define which schedule system is currently in use. Must be one of the following options: lsf, slurm, or unscheduled.
  • -ct : Used to define which backtracing tool should be used during callstack tracing. Must be one of the following options: glibc or libunwind. (Defaults to glibc) Note that if this is changed from default, then CSMPI will need to be built with the corresponding library. Instructions are at the CSMPI github repository.
  • -o : If used, allows the user to define their own path to store output from the project. Be sure to define an absolute path that can exist on your machine. Use a different path when running multiple times on the same settings to avoid overwriting. (Defaults to the directory '$HOME/comm_pattern_output')
  • -nd : Takes 3 arguments in decimal format (start percent, step size, end percent) to define message non-determinism percentages present in the final data. Start percent and end percent are the lowest and highest percentages used respectively. The step size defines the percentages in between. For example, default values correspond to '-nd 0.0 0.1 1.0'. The percentages used from this are 0, 10, 20, 30, ..., 100. This is the recommended setting. All 3 values must fall between 0 and 1, inclusive, and must satisfy the relationship 'start percent + (step size * number of percentages used) = end percent'. All 3 values must also contain no more than 2 digits past the decimal. (i.e. corresponding to integer percentages) (Defaults to starting percent of 0%, step size of 10%, and ending percent of 100%)
  • -nt : When running the unstructured mesh communication pattern, takes the percentage of topological non-determinism in decimal format. For example, default values correspond to '-nt 0.5'. Value must fall in the range of 0 to 1, inclusive. (Defaults to 50% topological non-determinism percentage)
  • -c : When running the unstructured mesh communication pattern, use this with 3 arguments (integers greater than 1) to define the grid coordinates. The three values must be set so that their product equals the number of processes used. (Ex. -c 2 3 4)
  • -v : If used, will display the execution settings prior to running the execution.
  • -h : Used to display the list of switch options.

If you're running on a system that uses the Slurm scheduler, then the following switches can be used to define settings for job submission:

  • -q : Defines the queue to submit scheduled jobs to. (Defaults to the "normal" queue)
  • -t : A maximum time limit in minutes on the time provided to jobs submitted. (Default 10 minutes)

If the project is run with settings that are small, then the communication pattern generated may end up not being non-deterministic. A few things can be done to increase the odds of inducing non-determinism in a simulation:

  • It is good to run on a large number of processes (at least 10) and a large number of runs (at least 50) to increase the odds of non-determinism arising.
  • By running a communication pattern with multiple iterations (using the -i flag), the user can cause more non-determinism. This is particularly important when running the message race communication pattern.
  • Running with a small message size (using the -s flag) can increase the likelihood of non-determinism.
  • Running the program across multiple compute nodes (using the -n flag) can help to cause more non-determinism.

Below is an example run of the script as one might submit it to run message_race on an unscheduled system.

. ./comm_pattern_analysis.sh -p 20 -i 10 -v -r 100 -o $HOME/message_race_sim_1

Below is another example run of the script as one might submit it on the Stampede2 cluster computer:

. ./comm_pattern_analysis.sh -p 48 -n 2 -v -r 50 -sq "skx-normal" -o $WORK2/anacinx_output_1

If a communication pattern or a scheduler type has not been provided to the script using one of command line switches above, follow the prompts at the beginning of the script to select them. You will need to input a communication pattern to generate and which scheduler your computing system employs. You can choose between any of the communication patterns listed in the supported settings section below with these corresponding formats: message_race, amg2013, unstructured_mesh. And you can choose one of the following scheduler systems: lsf, slurm, unscheduled.

Be aware that if you run the project on some machines and some job queues, there will be a limit to the number of jobs that can be submitted. In such cases, you may lose some jobs if you try to run the program with settings that produce more jobs than are allowed in the queue being used.

Running ANACIN-X with an External Application

As stated in the Software Overview page of this wiki, there are 3 major stages of the ANACIN-X framework prior to visualization. We will describe how to use each of these. As a reminder, the three stages are:

Be sure to install all dependencies listed in the Dependencies page prior to running the stages.

Execution Trace Collection

In this stage of the ANACIN-X framework, you will trace an input application using a 'stack' of MPI profiling interface (PMPI) tools. Specifically, you will trace the application using the following tools:

  • sst-dumpi
  • Pluto
  • CSMPI (CSMPI is optional for the purpose of visualizing kernel distance data, but is required to visualize callstack data.)

To 'stack' the above software tools, use the interface PnMPI, open-source, MPI tool infrastructure that builds on top of the standardized PMPI interface.

Be sure to configure all the PMPI tools prior to tracing. And be sure to link the PMPI tools using PnMPI prior to tracing. See their respective GitHub pages linked above for more details. PnMPI will need to be configured to use the linked modules. See the PnMPI GitHub page for more information on linking software and configuring PnMPI.

Your command to trace your application will likely look something of the form:

LD_PRELOAD=<libpnmpi.so path> PNMPI_LIB_PATH=<pnmpi linking directory> PNMPI_CONF=<pnmpi config directory> mpirun -np P E A

Positional Arguments:
P    Number of MPI processes requested
E    Application executable to be traced
A    Arguments to traced application

You will need to run the above command many times to produce a sample of traces. The traces across runs will be compared in subsequent stages.

Be sure to configure each of the PMPI tools to store their output for a single run in the same directory. We will call it a 'run directory'. Suppose you make 100 runs of your application. There should then be 100 'run directories' adjacent to each other, each one storing all the trace files from a run.

Event Graph Construction

Once you have generated a set of traces for a given application, the traces must be used to generate 'event graphs' for the purpose of analysis. We use the dumpi_to_graph software tool to convert traces into event graphs.

Dumpi_to_graph must be configured prior to use. Please see the dumpi_to_graph GitHub page to create a configuration file for your needs, using the examples that dumpi_to_graph provides as a reference. For ease of use, store any configuration files you create within dumpi_to_graph's config directory.

Use a command of the following form to construct an event graph from the traces within one run directory. Note that dumpi_to_graph is designed to be parallelized using MPI.

mpirun -np P dE dC R

Positional Arguments:
P    Number of MPI processes requested
dE   dumpi_to_graph executable file (should be found in dumpi_to_graphs build directory)
dC   dumpi_to_graph configuration file (likely found in dumpi_to_graphs config directory)
R    Path to a directory storing traces (i.e., a run directory)

The above command must be run for each run directory. An event graph will be stored in each run directory that the above command is used on. Then the event graphs can be compared in the next stage of ANACIN-X.

Event Graph Kernel Analysis

Event graph kernel analysis is composed of two parts:

  1. Event graph slice extraction
  2. Event graph kernel calculations

The first of these, slice extraction, requires the use of a policy file to define where to start and stop slices of a graph. Policy files are provided and can be found within the 'anacin-x/event_graph_analysis/slicing_policies' directory. Each one will break the graph up into components based on different functions or data and may be relevant to different applications. If your application uses barriers, we recommend using one of the slicing policy files with 'barrier_delimited_' in the title.

Event graph slice extraction must take place for each event graph. Once you have decided on a slicing policy to use, run the following command for each event graph using the extract slices script 'anacin-x/event_graph_analysis/extract_slice.py'. Note that this script is parallelized using mpi4py.

mpirun -np P anacin-x/event_graph_analysis/extract_slice.py EG SP -o "slices"

Positional Arguments:
P    Number of MPI processes requested
EG   An event graph file (graphml) from one run directory
SP   Slicing policy file found in the 'anacin-x/event_graph_analysis/slicing_policies' directory

After slice extraction is complete for all event graphs, the final step to creating kernel distance data is kernel calculations on the event graph slices.

To perform kernel calculations, use the 'compute_kernel_distance_time_series.py' script within the 'anacin-x/event_graph_analysis' directory. Note that this script is parallelized using mpi4py.

To run this script, you will need to select a graph kernel policy file. We suggest using the file 'anacin-x/event_graph_analysis/wlst_5iters_logical_timestamp_label.json' because it is the most well tested with the workflow. This file corresponds to running the Weisfeiller-Lehmann subtree (WLST) graph kernel with 5 iterations and the logical timestamp for vertex label data. More information about this kernel can be found in the paper "Weisfeiller-lehmann graph kernels". See the file 'wlst_sweep_vertex_labels.json' within the same directory as the recommended kernel file for examples of other vertex labels to use with the WLST kernel. You can change the graph kernel file to fit what is best for your project.

Finally, run the following command to generate kernel distance data.

mpirun -np P anacin-x/event_graph_analysis/compute_kernel_distance_time_series.py T KP --slicing_policy SP -o "kdts.pkl" --slice_dir_name "slices" [-c]

Positional Arguments:
P    Number of MPI processes requested
T    Directory storing all run directories
KP   Graph kernel policy file found in the 'anacin-x/event_graph_analysis/graph_kernel_policies' directory
SP   Slicing policy file found in the 'anacin-x/event_graph_analysis/slicing_policies' directory

Options:
-c   Include this if you traced your application with CSMPI to collect callstack data

Unlike the previous stages of the workflow, the kernel calculation command above only takes place once for all runs of the traced application.