-
Notifications
You must be signed in to change notification settings - Fork 0
serial_parallel_hello
All commands showed on this page have to be executed in directory C/serial_parallel_hello
.
The "simulation" writes the following in file staged/Cpok_0
Hello devel02.plafrim.cluster 0
Hello devel02.plafrim.cluster 1
Hello devel02.plafrim.cluster 2
Hello devel02.plafrim.cluster 3
Hello devel02.plafrim.cluster 4
The post-processing reads staged/Cpok_0
, extracts the first line and outputs it in file ./resC
. Typically
Hello devel02.plafrim.cluster 0
source compile.sh
./launch_without_pdwfs.sh
Typical output:
########### Launching simu ##############
simu: running on host devel02.plafrim.cluster
########### Launching post-process ##############
post-process: running on host devel02.plafrim.cluster
########### Done ##############
Here the file staged/Cpok_0
has been written on disk and read back by the post-processing application.
./launch_local.sh
Typical output:
81991:C 21 Feb 2020 09:36:29.398 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
81991:C 21 Feb 2020 09:36:29.398 # Redis version=5.0.6, bits=64, commit=00000000, modified=0, pid=81991, just started
81991:C 21 Feb 2020 09:36:29.398 # Configuration loaded
########### Launching simu ##############
[PDWFS][82004][TRACE][C] intercepting fopen(path=staged/Cpok_0, mode=w)
simu: running on host devel02.plafrim.cluster
[PDWFS][82004][TRACE][C] intercepting fprintf(stream=0x21b2f00, ...)
[PDWFS][82004][TRACE][C] intercepting fputs(s=Hello devel02.plafrim.cluster 0
, stream=0x21b2f00)
[PDWFS][82004][TRACE][C] intercepting fwrite(ptr=0x21adfa0, size=1, nmemb=32, stream=0x21b2f00)
[PDWFS][82004][TRACE][C] intercepting fprintf(stream=0x21b2f00, ...)
[PDWFS][82004][TRACE][C] intercepting fputs(s=Hello devel02.plafrim.cluster 1
, stream=0x21b2f00)
[PDWFS][82004][TRACE][C] intercepting fwrite(ptr=0x21adfa0, size=1, nmemb=32, stream=0x21b2f00)
[PDWFS][82004][TRACE][C] intercepting fprintf(stream=0x21b2f00, ...)
[PDWFS][82004][TRACE][C] intercepting fputs(s=Hello devel02.plafrim.cluster 2
, stream=0x21b2f00)
[PDWFS][82004][TRACE][C] intercepting fwrite(ptr=0x21adfa0, size=1, nmemb=32, stream=0x21b2f00)
[PDWFS][82004][TRACE][C] intercepting fprintf(stream=0x21b2f00, ...)
[PDWFS][82004][TRACE][C] intercepting fputs(s=Hello devel02.plafrim.cluster 3
, stream=0x21b2f00)
[PDWFS][82004][TRACE][C] intercepting fwrite(ptr=0x21adfa0, size=1, nmemb=32, stream=0x21b2f00)
[PDWFS][82004][TRACE][C] intercepting fprintf(stream=0x21b2f00, ...)
[PDWFS][82004][TRACE][C] intercepting fputs(s=Hello devel02.plafrim.cluster 4
, stream=0x21b2f00)
[PDWFS][82004][TRACE][C] intercepting fwrite(ptr=0x21adfa0, size=1, nmemb=32, stream=0x21b2f00)
[PDWFS][82004][TRACE][C] intercepting fclose(stream=0x21b2f00)
[PDWFS][82004][TRACE][C] intercepting close(fd=5)
[PDWFS][82004][TRACE][C] intercepting close(fd=5)
[PDWFS][82004][TRACE][C] calling libc close
(nil)
########### Launching post-process ##############
post-process: running on host devel02.plafrim.cluster
[PDWFS][82017][TRACE][C] intercepting fopen(path=resC, mode=w)
[PDWFS][82017][TRACE][C] calling libc fopen
[PDWFS][82017][TRACE][C] intercepting fopen(path=staged/Cpok_0, mode=r)
[PDWFS][82017][TRACE][C] intercepting fread(ptr=0x7ffeabd1e4e0, size=1, nmemb=640, stream=0x913140)
[PDWFS][82017][TRACE][C] intercepting fclose(stream=0x913140)
[PDWFS][82017][TRACE][C] intercepting close(fd=6)
[PDWFS][82017][TRACE][C] intercepting close(fd=6)
[PDWFS][82017][TRACE][C] calling libc close
[PDWFS][82017][TRACE][C] intercepting fprintf(stream=0x912f00, ...)
[PDWFS][82017][TRACE][C] intercepting fputs(s=Hello devel02.plafrim.cluster 0
, stream=0x912f00)
[PDWFS][82017][TRACE][C] calling libc fputs
[PDWFS][82017][TRACE][C] intercepting fclose(stream=0x912f00)
[PDWFS][82017][TRACE][C] calling libc fclose
########### Done ##############
Here the file staged/Cpok_0
does not exist anymore. All related POSIX calls have been intercepted by pdwfs and the data stored in and retrieved from a single redis server.
sbatch job_pdwfs.sh
It will create a directory per job run and will produce all output in it. Typical output:
nodes: miriel[017-018]
[PDWFS][init] Start central Redis instance on miriel017.plafrim.cluster:34000
[PDWFS][131199][TRACE][C] intercepting fopen(path=staged/Cpok_0, mode=w)
[PDWFS][131199][TRACE][C] intercepting fprintf(stream=0x888f00, ...)
[PDWFS][131199][TRACE][C] intercepting fputs(s=Hello miriel018.plafrim.cluster 0
, stream=0x888f00)
[PDWFS][131199][TRACE][C] intercepting fwrite(ptr=0x883f70, size=1, nmemb=34, stream=0x888f00)
[PDWFS][131199][TRACE][C] intercepting fprintf(stream=0x888f00, ...)
[PDWFS][131199][TRACE][C] intercepting fputs(s=Hello miriel018.plafrim.cluster 1
, stream=0x888f00)
[PDWFS][131199][TRACE][C] intercepting fwrite(ptr=0x883f70, size=1, nmemb=34, stream=0x888f00)
[PDWFS][131199][TRACE][C] intercepting fprintf(stream=0x888f00, ...)
[PDWFS][131199][TRACE][C] intercepting fputs(s=Hello miriel018.plafrim.cluster 2
, stream=0x888f00)
[PDWFS][131199][TRACE][C] intercepting fwrite(ptr=0x883f70, size=1, nmemb=34, stream=0x888f00)
[PDWFS][131199][TRACE][C] intercepting fprintf(stream=0x888f00, ...)
[PDWFS][131199][TRACE][C] intercepting fputs(s=Hello miriel018.plafrim.cluster 3
, stream=0x888f00)
[PDWFS][131199][TRACE][C] intercepting fwrite(ptr=0x883f70, size=1, nmemb=34, stream=0x888f00)
[PDWFS][131199][TRACE][C] intercepting fprintf(stream=0x888f00, ...)
[PDWFS][131199][TRACE][C] intercepting fputs(s=Hello miriel018.plafrim.cluster 4
, stream=0x888f00)
[PDWFS][131199][TRACE][C] intercepting fwrite(ptr=0x883f70, size=1, nmemb=34, stream=0x888f00)
[PDWFS][131199][TRACE][C] intercepting fclose(stream=0x888f00)
[PDWFS][131199][TRACE][C] intercepting close(fd=5)
[PDWFS][131199][TRACE][C] intercepting close(fd=5)
[PDWFS][131199][TRACE][C] calling libc close
simu: running on host miriel018.plafrim.cluster
redis-cli -h miriel017.plafrim.cluster -p 34000 --scan
addr
PONG
[PDWFS][137847][TRACE][C] intercepting fopen(path=resC, mode=w)
[PDWFS][137847][TRACE][C] calling libc fopen
[PDWFS][137847][TRACE][C] intercepting fopen(path=staged/Cpok_0, mode=r)
[PDWFS][137847][TRACE][C] intercepting fread(ptr=0x7ffee8ab7880, size=1, nmemb=640, stream=0x2200340)
[PDWFS][137847][TRACE][C] intercepting fclose(stream=0x2200340)
[PDWFS][137847][TRACE][C] intercepting close(fd=6)
[PDWFS][137847][TRACE][C] intercepting close(fd=6)
[PDWFS][137847][TRACE][C] calling libc close
[PDWFS][137847][TRACE][C] intercepting fprintf(stream=0x2200100, ...)
[PDWFS][137847][TRACE][C] intercepting fputs(s=Hello miriel018.plafrim.cluster 0
, stream=0x2200100)
[PDWFS][137847][TRACE][C] calling libc fputs
[PDWFS][137847][TRACE][C] intercepting fclose(stream=0x2200100)
[PDWFS][137847][TRACE][C] calling libc fclose
post-process: running on host miriel017.plafrim.cluster
Very similarly as the local execution, it just happens on two compute nodes of a cluster with SLURM as batch scheduler with the redis server and the post-processing running on one node and the simulation on another.
The parallel version of the "simulation" follows the same idea. The simulation outputs two files staged/Cpok_0
and staged/Cpok_N
written by rank 0 and rank N=nb_rank/2. They contain similar data, only the integer value will be different as the rank number is added and potentially the machine name if rank N runs on another node than rank 0. Typically on 48 ranks deployed on nodes that have 24 cores:
staged/Cpok_0
Hello miriel002.plafrim.cluster 0
Hello miriel002.plafrim.cluster 1
Hello miriel002.plafrim.cluster 2
Hello miriel002.plafrim.cluster 3
Hello miriel002.plafrim.cluster 4
staged/Cpok_24
Hello miriel003.plafrim.cluster 24
Hello miriel003.plafrim.cluster 25
Hello miriel003.plafrim.cluster 26
Hello miriel003.plafrim.cluster 27
Hello miriel003.plafrim.cluster 28
The post-processing reads staged/Cpok_0
and staged/Cpok_N
, extracts the first line and outputs it in file ./resC
. Typically:
Hello miriel002.plafrim.cluster 0
Hello miriel003.plafrim.cluster 24
source compile_mpi.sh
sbatch job_mpi.sh
Typical output:
########### Launching simu ##############
simu: running on host miriel058.plafrim.cluster
simu: running on host miriel060.plafrim.cluster
########### Launching post-process ##############
post-process: running on host miriel057.plafrim.cluster
########### Done ##############
Now the job uses three compute nodes. The redis server and the post-processing running on one node and the simulation on two other nodes with one MPI rank per core.
sbatch job_pdwfs_mpi.sh
TODO: not working for now on the cluster, srun issues