Skip to content

Latest commit

 

History

History
102 lines (77 loc) · 5.47 KB

README.md

File metadata and controls

102 lines (77 loc) · 5.47 KB

PIMProf

This project is compatible with LLVM 10 and clang 10

Structure of repository

  • Configs/: The configuration files of PIMProf. The default is defaultconfig_32.ini.
  • LLVMAnalysis/: The tool for instrumenting the program. This is implemented as an LLVM pass and invoked by clang. This directory also contains some hooks that can be used for annotating region of interest.
  • PIMProfSolver/: The Pin tool for analyzing the instrumented program.
  • test/: The unit test.

Prerequisite

Install llvm-10 and clang-10.

$ apt install clang-10 llvm-10` 

Compilation

Fill in the directory of your downloaded llvm-10 in CMakeLists.txt:

set(LLVM_HOME "/usr/lib/llvm-10")

Then compile:

$ make -j

Integration of PIMProf into Sniper

PIMProf solver now entirely depends on the runtime performance provided by simulators. As the proof of concept, we integrate our tool to Sniper in a separated repository:

https://github.com/Systems-ShiftLab/sniper_PIMProf

Clone the repository and checkout to the dev branch to see all changes made by PIMProf.

$ git checkout dev

To compile Sniper, you might want to check the Sniper website (https://snipersim.org) and follow their instructions. You need to install a few prerequisite libraries, and download a recent version of Intel Pin tool before compiling Sniper. Note that the current version of Sniper only works with Pin <= 3.20.

We made minimal modifications to integrate PIMProf into Sniper. All the changes to the Sniper code base can be found by greping "Yizhou" in the repository, and the same idea can be applied when integrating PIMProf to other simulators. We found it easiest to directly modify the include directory in common/system/simulator.h.

#include "/home/warsier/Downloads/PIMProf/PIMProfSolver/Stats.h"

Testing

The sniper_PIMProf repository also comes with two testing suites: a unit test, and the GAP graph workload suites. They can be found in folder sniper_PIMProf/PIMProf.

Unit test

The unit test will provide a basic idea of how to use PIMProf to generate offloading decisions. The steps are listed as follows:

1. Compilation

Let's take a look at Makefile in the unit test:

To create an annotated version of test test.inj, where the beginning and end of each of its basic block are marked, we invoke the LLVM pass libAnnotationInjection.so. The expanded command will look like this:

export PIMPROFINJECTMODE=SNIPER && clang++-10 $(CXXFLAGS) $(SNIPER_CFLAGS) -Xclang -load -Xclang $(PIMPROF_ROOT)/build/LLVMAnalysis/libAnnotationInjection.so -o test.inj test.cpp -pthread

On compilation, we need to set the environment variable PIMPROFINJECTMODE and then compile the program using LLVM pass libAnnotationInjection.so. There are two available PIMPROFINJECTMODEs: SNIPER, which will insert annotation at basic block level; and SNIPER2, which will insert annotation at function level.

Note that PIMProf now does not require any modification to the source code. So any annotations in the source code of the unit test or the GAP workloads are deprecated.

2. Simulation

Now take a look at run_inj.sh in the unit test. This script can be directly used to generate offloading decisions for the unit test if SOLVER is correctly pointing to the PIMProf solver located at build/PIMProfSolver/Solver.exe.

We need two Sniper runs to generate the CPU performance and PIM performance separately. Using the following commands, the corresponding PIMProf results will be generated in folder inj_cpu and inj_pim:

export OMP_NUM_THREADS=1 && run-sniper --roi -n 1 -c pimprof_cpu -d inj_cpu -- ./test.inj
export OMP_NUM_THREADS=4 && run-sniper --roi -n 4 -c pimprof_pim -d inj_pim -- ./test.inj

3. Decision Solving

As the last step, we feed the runtime profile to PIMProf solver Solver.exe to generate offloading decisions. The usage of Solver is shown below:

Solver.exe <mode> -c <cpu_stats_file> -p <pim_stats_file> -r <reuse_file> -o <output_file>

Select mode from: mpki, para, reuse.

In the result folder inj_cpu and inj_pim, there are two files of concern: pimprofstats.out contains the runtime statistics of that run, and pimprofreuse.out contains the data reuse information.

The example to generate the reuse decision in run_inj.sh looks like this:

Solver.exe reuse -c inj_cpu/pimprofstats.out -p inj_pim/pimprofstats.out -r inj_cpu/pimprofreuse.out -o reusedecision.out

where we use both pimprofstats.out from inj_cpu and inj_pim as the CPU and PIM stats, and the pimprofreuse.out only from the CPU run, because the reuse from the PIM run would be the same as CPU.

The generated decision is stored in reusedecision.out.

GAP graph workloads (https://github.com/sbeamer/gapbs)

We have modified the Makefile and provide a simple run_inj.sh to demonstrate the idea of how to provide offloading decisions for GAP.

Note that the repository does not come with any pre-generated graphs. To generate graphs for testing purpose, please refer to the README from GAP.

Before compilation, you need to modify the CXX, PIMPROF_ROOT, PIMPROF_MODE to the correct value. Then you may run:

$ make inj

to generate the corresponding binary.

notes

https://stackoverflow.com/questions/8486314/setting-processor-affinity-with-c-that-will-run-on-linux