Skip to content

serial_parallel_hello

Matthieu Haefele edited this page Mar 10, 2020 · 23 revisions

Serial and MPI parallel hello world

Table of Contents

All commands showed on this page have to be executed in directory C/serial_parallel_hello.

pdwfs with serial C application

The "simulation" writes the following in file staged/Cpok_0

Hello devel02.plafrim.cluster 0
Hello devel02.plafrim.cluster 1
Hello devel02.plafrim.cluster 2
Hello devel02.plafrim.cluster 3
Hello devel02.plafrim.cluster 4

The post-processing reads staged/Cpok_0, extracts the first line and outputs it in file ./resC. Typically

Hello devel02.plafrim.cluster 0

Compile

source compile.sh

Run without pdfws

./launch_without_pdwfs.sh

Typical output:

########### Launching simu ##############
simu: running on host devel02.plafrim.cluster
########### Launching post-process ##############
post-process: running on host devel02.plafrim.cluster
########### Done ##############

Here the file staged/Cpok_0 has been written on disk and read back by the post-processing application.

Run with pdfws locally on a laptop or the frontend node of a cluster

./launch_local.sh

Typical output:

81991:C 21 Feb 2020 09:36:29.398 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
81991:C 21 Feb 2020 09:36:29.398 # Redis version=5.0.6, bits=64, commit=00000000, modified=0, pid=81991, just started
81991:C 21 Feb 2020 09:36:29.398 # Configuration loaded
########### Launching simu ##############
[PDWFS][82004][TRACE][C] intercepting fopen(path=staged/Cpok_0, mode=w)
simu: running on host devel02.plafrim.cluster
[PDWFS][82004][TRACE][C] intercepting fprintf(stream=0x21b2f00, ...)
[PDWFS][82004][TRACE][C] intercepting fputs(s=Hello devel02.plafrim.cluster 0
, stream=0x21b2f00)
[PDWFS][82004][TRACE][C] intercepting fwrite(ptr=0x21adfa0, size=1, nmemb=32, stream=0x21b2f00)
[PDWFS][82004][TRACE][C] intercepting fprintf(stream=0x21b2f00, ...)
[PDWFS][82004][TRACE][C] intercepting fputs(s=Hello devel02.plafrim.cluster 1
, stream=0x21b2f00)
[PDWFS][82004][TRACE][C] intercepting fwrite(ptr=0x21adfa0, size=1, nmemb=32, stream=0x21b2f00)
[PDWFS][82004][TRACE][C] intercepting fprintf(stream=0x21b2f00, ...)
[PDWFS][82004][TRACE][C] intercepting fputs(s=Hello devel02.plafrim.cluster 2
, stream=0x21b2f00)
[PDWFS][82004][TRACE][C] intercepting fwrite(ptr=0x21adfa0, size=1, nmemb=32, stream=0x21b2f00)
[PDWFS][82004][TRACE][C] intercepting fprintf(stream=0x21b2f00, ...)
[PDWFS][82004][TRACE][C] intercepting fputs(s=Hello devel02.plafrim.cluster 3
, stream=0x21b2f00)
[PDWFS][82004][TRACE][C] intercepting fwrite(ptr=0x21adfa0, size=1, nmemb=32, stream=0x21b2f00)
[PDWFS][82004][TRACE][C] intercepting fprintf(stream=0x21b2f00, ...)
[PDWFS][82004][TRACE][C] intercepting fputs(s=Hello devel02.plafrim.cluster 4
, stream=0x21b2f00)
[PDWFS][82004][TRACE][C] intercepting fwrite(ptr=0x21adfa0, size=1, nmemb=32, stream=0x21b2f00)
[PDWFS][82004][TRACE][C] intercepting fclose(stream=0x21b2f00)
[PDWFS][82004][TRACE][C] intercepting close(fd=5)
[PDWFS][82004][TRACE][C] intercepting close(fd=5)
[PDWFS][82004][TRACE][C] calling libc close
(nil)
########### Launching post-process ##############
post-process: running on host devel02.plafrim.cluster
[PDWFS][82017][TRACE][C] intercepting fopen(path=resC, mode=w)
[PDWFS][82017][TRACE][C] calling libc fopen
[PDWFS][82017][TRACE][C] intercepting fopen(path=staged/Cpok_0, mode=r)
[PDWFS][82017][TRACE][C] intercepting fread(ptr=0x7ffeabd1e4e0, size=1, nmemb=640, stream=0x913140)
[PDWFS][82017][TRACE][C] intercepting fclose(stream=0x913140)
[PDWFS][82017][TRACE][C] intercepting close(fd=6)
[PDWFS][82017][TRACE][C] intercepting close(fd=6)
[PDWFS][82017][TRACE][C] calling libc close
[PDWFS][82017][TRACE][C] intercepting fprintf(stream=0x912f00, ...)
[PDWFS][82017][TRACE][C] intercepting fputs(s=Hello devel02.plafrim.cluster 0
, stream=0x912f00)
[PDWFS][82017][TRACE][C] calling libc fputs
[PDWFS][82017][TRACE][C] intercepting fclose(stream=0x912f00)
[PDWFS][82017][TRACE][C] calling libc fclose
########### Done ##############

Here the file staged/Cpok_0 does not exist anymore. All related POSIX calls have been intercepted by pdwfs and the data stored in and retrieved from a single redis server.

Run with pdfws as a SLURM job on the compute nodes of a cluster

sbatch job_pdwfs.sh

It will create a directory per job run and will produce all output in it. Typical output:

nodes: miriel[017-018]
[PDWFS][init] Start central Redis instance on miriel017.plafrim.cluster:34000
[PDWFS][131199][TRACE][C] intercepting fopen(path=staged/Cpok_0, mode=w)
[PDWFS][131199][TRACE][C] intercepting fprintf(stream=0x888f00, ...)
[PDWFS][131199][TRACE][C] intercepting fputs(s=Hello miriel018.plafrim.cluster 0
, stream=0x888f00)
[PDWFS][131199][TRACE][C] intercepting fwrite(ptr=0x883f70, size=1, nmemb=34, stream=0x888f00)
[PDWFS][131199][TRACE][C] intercepting fprintf(stream=0x888f00, ...)
[PDWFS][131199][TRACE][C] intercepting fputs(s=Hello miriel018.plafrim.cluster 1
, stream=0x888f00)
[PDWFS][131199][TRACE][C] intercepting fwrite(ptr=0x883f70, size=1, nmemb=34, stream=0x888f00)
[PDWFS][131199][TRACE][C] intercepting fprintf(stream=0x888f00, ...)
[PDWFS][131199][TRACE][C] intercepting fputs(s=Hello miriel018.plafrim.cluster 2
, stream=0x888f00)
[PDWFS][131199][TRACE][C] intercepting fwrite(ptr=0x883f70, size=1, nmemb=34, stream=0x888f00)
[PDWFS][131199][TRACE][C] intercepting fprintf(stream=0x888f00, ...)
[PDWFS][131199][TRACE][C] intercepting fputs(s=Hello miriel018.plafrim.cluster 3
, stream=0x888f00)
[PDWFS][131199][TRACE][C] intercepting fwrite(ptr=0x883f70, size=1, nmemb=34, stream=0x888f00)
[PDWFS][131199][TRACE][C] intercepting fprintf(stream=0x888f00, ...)
[PDWFS][131199][TRACE][C] intercepting fputs(s=Hello miriel018.plafrim.cluster 4
, stream=0x888f00)
[PDWFS][131199][TRACE][C] intercepting fwrite(ptr=0x883f70, size=1, nmemb=34, stream=0x888f00)
[PDWFS][131199][TRACE][C] intercepting fclose(stream=0x888f00)
[PDWFS][131199][TRACE][C] intercepting close(fd=5)
[PDWFS][131199][TRACE][C] intercepting close(fd=5)
[PDWFS][131199][TRACE][C] calling libc close
simu: running on host miriel018.plafrim.cluster
redis-cli -h miriel017.plafrim.cluster -p 34000 --scan
addr
PONG
[PDWFS][137847][TRACE][C] intercepting fopen(path=resC, mode=w)
[PDWFS][137847][TRACE][C] calling libc fopen
[PDWFS][137847][TRACE][C] intercepting fopen(path=staged/Cpok_0, mode=r)
[PDWFS][137847][TRACE][C] intercepting fread(ptr=0x7ffee8ab7880, size=1, nmemb=640, stream=0x2200340)
[PDWFS][137847][TRACE][C] intercepting fclose(stream=0x2200340)
[PDWFS][137847][TRACE][C] intercepting close(fd=6)
[PDWFS][137847][TRACE][C] intercepting close(fd=6)
[PDWFS][137847][TRACE][C] calling libc close
[PDWFS][137847][TRACE][C] intercepting fprintf(stream=0x2200100, ...)
[PDWFS][137847][TRACE][C] intercepting fputs(s=Hello miriel018.plafrim.cluster 0
, stream=0x2200100)
[PDWFS][137847][TRACE][C] calling libc fputs
[PDWFS][137847][TRACE][C] intercepting fclose(stream=0x2200100)
[PDWFS][137847][TRACE][C] calling libc fclose
post-process: running on host miriel017.plafrim.cluster

Very similarly as the local execution, it just happens on two compute nodes of a cluster with SLURM as batch scheduler with the redis server and the post-processing running on one node and the simulation on another.

pdwfs with parallel C application

The parallel version of the "simulation" follows the same idea. The simulation outputs two files staged/Cpok_0 and staged/Cpok_N written by rank 0 and rank N=nb_rank/2. They contain similar data, only the integer value will be different as the rank number is added and potentially the machine name if rank N runs on another node than rank 0. Typically on 48 ranks deployed on nodes that have 24 cores:
staged/Cpok_0

Hello miriel002.plafrim.cluster 0
Hello miriel002.plafrim.cluster 1
Hello miriel002.plafrim.cluster 2
Hello miriel002.plafrim.cluster 3
Hello miriel002.plafrim.cluster 4

staged/Cpok_24

Hello miriel003.plafrim.cluster 24
Hello miriel003.plafrim.cluster 25
Hello miriel003.plafrim.cluster 26
Hello miriel003.plafrim.cluster 27
Hello miriel003.plafrim.cluster 28

The post-processing reads staged/Cpok_0 and staged/Cpok_N, extracts the first line and outputs it in file ./resC. Typically:

Hello miriel002.plafrim.cluster 0
Hello miriel003.plafrim.cluster 24

Compile

source compile_mpi.sh

Run without pdwfs on compute nodes

sbatch job_mpi.sh

Typical output:

########### Launching simu ##############
simu: running on host miriel058.plafrim.cluster
simu: running on host miriel060.plafrim.cluster
########### Launching post-process ##############
post-process: running on host miriel057.plafrim.cluster
########### Done ##############

Now the job uses three compute nodes. The redis server and the post-processing running on one node and the simulation on two other nodes with one MPI rank per core.

Run with pdwfs on compute nodes

sbatch job_pdwfs_mpi.sh

TODO: not working for now on the cluster, srun issues