Name		Name	Last commit message	Last commit date
parent directory ..
Makefile		Makefile
README.md		README.md
core.cpp		core.cpp
heat.h		heat.h
io.cpp		io.cpp
main.cpp		main.cpp
pngwriter.c		pngwriter.c
pngwriter.h		pngwriter.h
setup.cpp		setup.cpp
utilities.cpp		utilities.cpp

README.md

Thread pinning for heat equation solver

The two dimensional heat equation solver here is memory bound. In Mahti, the memory bandwidth per core, and sometimes also performance is improved when the node is undersubscribed, i.e. all the cores are not used.

Build the code with the provided Makefile

make

If you want try different compilers load the corresponding module and provide COMP variable for the Makefile

module load clang
make COMP=clang

(Note that for this exercise the compiler should make no difference).

With the default input data, single node performance in Mahti is best (although by a small margin) with 8 MPI tasks and 4 OpenMP threads (i.e using only 32 cores). Thread/process affinity can, however, have a large impact on perfomance. Try to run the code first with the settings

...
#SBATCH --ntasks-per-node=8
#SBATCH --cpus-per-task=4

export OMP_AFFINITY_FORMAT="Process %P level %L thread %0.3n affinity %A"
export OMP_DISPLAY_AFFINITY=true
export OMP_NUM_THREADS=4
srun ./heat_hybrid

How does the affinity look?

Next, try to place the MPI tasks more evenly by having ntasks_per_node x cpus_per_task = 128. Can you explain the difference in the performance?

Then, try to pin the threads with OMP_PLACES=cores and investigate the effect of different OMP_PROC_BIND settings. Pay attention to how affinity affects performance.

In CSC's Slurm configuration oversubscribing the cores by MPI tasks is not easy by accident, but it is possible when SMT is enabled by the Slurm option --hint=multithread. Try the following Slurm settings, and pay attention to affinity and performance:

...
#SBATCH --ntasks-per-node=32
#SBATCH --cpus-per-task=4
#SBATCH --hint=multithread

export OMP_AFFINITY_FORMAT="Process %P level %L thread %0.3n affinity %A"
export OMP_DISPLAY_AFFINITY=true
export OMP_NUM_THREADS=4
srun ./heat_hybrid

Can you figure out how to fix the performance problem?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

heat

heat

README.md

Thread pinning for heat equation solver

Files

heat

Directory actions

More options

Directory actions

More options

Latest commit

History

heat

Folders and files

parent directory

README.md

Thread pinning for heat equation solver