Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added build and instructions for Pixi environment manager #414

Merged
merged 5 commits into from
Nov 19, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 70 additions & 0 deletions platform/build/make.inc.PIXI
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
#---------------------------------------------------
# OSX, Linux mpich using pixi
#
# 1. Install Pixi
# curl -fsSL https://pixi.sh/install.sh | bash
#
# 2. Restart shell / terminal / console for Pixi to be recognized
#
# 3. Create Pixi environment
# pixi init <env_name>
#
# 4. Setup Pixi environment
# cd <env_name>
# git clone [email protected]:gafusion/gacode.git
# pixi add "python~=3.12,<3.13"
# pixi add git
# pixi add mpich
Copy link
Member

@smithsp smithsp Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had built a conda based environment on the PPPL portal cluster at one point, and we had a problem with using more than 32 cores. See #96 . Are you in a position to try building this on a system that can handle more than 32 cores and testing it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have access to a partition of the PSFC cluster with 64 cores per node. Let me build the code there and see if I cannot get it to use more than 32 cores on this cluster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have compiled it on the PSFC cluster using the PIXI instructions. It seems that the CGYRO regression tests return strange results with varying the number of cores. There is something wrong with the OpenBLAS package I used so OpenMP was not working for this test, I will debug that.

In the meantime, here are the regression results with 64 cores:

(py312_gacode) [aaronkho@eofe10 py312_gacode]$ cgyro -r -n 64 -nomp 1
REGRESSION TESTING: cgyro
reg01: ERROR - Regression data was not generated by simulation.
reg02: ERROR - Regression data was not generated by simulation.
reg03: ERROR - Regression data was not generated by simulation.
reg04: ERROR - Regression data was not generated by simulation.
reg05: PASS
reg06: ERROR - Regression data was not generated by simulation.
reg07: PASS
reg08: ERROR - Regression data was not generated by simulation.
reg09: ERROR - Regression data was not generated by simulation.
reg10: ERROR - Regression data was not generated by simulation.
reg11: ERROR - Regression data was not generated by simulation.
reg12: ERROR - Regression data was not generated by simulation.
reg13: ERROR - Regression data was not generated by simulation.
reg14: ERROR - Regression data was not generated by simulation.
reg15: ERROR - Regression data was not generated by simulation.
reg16: ERROR - Regression data was not generated by simulation.
reg17: PASS
reg18: ERROR - Regression data was not generated by simulation.
reg19: ERROR - Regression data was not generated by simulation.
reg20: ERROR - Regression data was not generated by simulation.
reg21: ERROR - Regression data was not generated by simulation.
reg22: ERROR - Regression data was not generated by simulation.

At 32 cores: reg04, reg12, reg13, reg14, reg18 fail.
At 16 cores: reg12 fails
At 8 cores: All regression tests pass

Please let me know if this is intended behaviour.

Copy link
Contributor Author

@aaronkho aaronkho Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the OpenBLAS issue by adding a missing library (8d4cc83). However, CGYRO regression tests still fail in the same ways.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's important to ensure OpenBLAS is single-threaded:

BINARY=64
USE_THREAD=0
USE_LOCKING=1
NO_SHARED=1
NO_CBLAS=1
NO_LAPACKE=1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jcandy The regression tests do not seem to change, regardless of using OpenBLAS library with single-thread or multi-thread. The only impact is that the computation time is longer with multi-thread, which makes sense with the quoted overhead problems with multi-thread OpenBLAS.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would pick one regression case that isn't working (say, reg12) and then run cgyro directly in the reg12 directory. We should get an idea where the code is failing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jcandy I ran everything with -n 32 and looked into reg12 and saw that it crashed with this error:
ERROR: (CGYRO) nc ( 24) not a multiple of coll atoa procs ( 32)

Inside the file out.cgyro.mpi, there is this output:

 Parallelization and distribution diagnostics

         nv:   256
         nc:    24
 GCD(nv,nc):     8
 n_toroidal:     1
     nt_loc:     1

           [coll]     [str]      [NL]      [NL]      [NL]    [coll]     [str]
  n_MPI    nc_loc    nv_loc   n_split  atoa[MB] atoa proc atoa proc ared proc
 ------    ------    ------   -------  -------- --------- --------- ---------
      1        24       256      6144      0.10         1         1         1
      2        12       128      3072      0.05         1         2         2
      4         6        64      1536      0.02         1         4         4
      8         3        32       768      0.01         1         8         8

My guess is that the regression test itself is not suitable to be parallelized across more than 24 cores?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly,
reg04 has ERROR: (CGYRO) nc ( 144) not a multiple of coll atoa procs ( 32)
reg13 has ERROR: (CGYRO) nc ( 144) not a multiple of coll atoa procs ( 32)
reg14 has ERROR: (CGYRO) nc ( 144) not a multiple of coll atoa procs ( 32)
reg18 has ERROR: (CGYRO) nc ( 144) not a multiple of coll atoa procs ( 32)

Copy link
Contributor Author

@aaronkho aaronkho Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching over to -n 24 on reg12 yields other problems, giving the following error:
ERROR: (CGYRO) nv ( 256) not a multiple of coll atoa procs ( 24)

So it seems that -n 8 passes all regression checks purely by being the largest common denominator of all the CGYRO grids within these tests (which naturally extends to -n 4 and -n 2).

# pixi add gfortran
# pixi add openblas
# pixi add liblapack
# pixi add fftw
# pixi add hdf5
# pixi add netcdf-fortran
#
# 5. Enter and configure Pixi environment (must be repeated every entry)
# pixi shell
# export GACODE_ROOT=${PIXI_PROJECT_ROOT}/gacode
# export GACODE_PLATFORM=CONDA
# source ${GACODE_ROOT}/shared/bin/gacode_setup
#
# 6. Build GACODE
# cd gacode
# make
# cd ..
#
# 7. Run regression tests
# neo -r
# tglf -r
# cgyro -r -n 4 -nomp 2
# tgyro -r -n 4
#
#---------------------------------------------------

MAKE = make
PREFIX = ${CONDA_PREFIX}
NETCDF_PATH=${PREFIX}
MF90 = mpif90

# Compilers and flags

FC = ${MF90} -std=f2008 -fall-intrinsics -I$(GACODE_ROOT)/modules -J$(GACODE_ROOT)/modules -g -I${PREFIX}/include
F77 = ${MF90} -g

FMATH = -fdefault-real-8 -fdefault-double-8
FOPT = -O3 -m64 -fallow-argument-mismatch
FDEBUG = -Wall -fcheck=all -fbacktrace -fbounds-check -O0 -Wextra -finit-real=nan -Wunderflow -ffpe-trap=invalid,zero,overflow
FBOUND = -Wall -fbounds-check
FOMP = -fopenmp

# System math libraries

LMATH = -L${PREFIX}/lib -lfftw3 -llapack -lblas

NETCDF = -L${PREFIX}/lib -lnetcdff -lnetcdf
NETCDF_INC =${PREFIX}/include

# Archive

ARCH = ar cr

2 changes: 2 additions & 0 deletions platform/env/env.PIXI
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# The PIXI platform assumes that you have installed pixi and run the
# appropriate commands to have the virtual environment set up well.
19 changes: 19 additions & 0 deletions platform/exec/exec.PIXI
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
#!/bin/sh
# GACODE Parallel execution script (PIXI)
#
# NOTES:
# Used mpich2-1.0.1, so use mpirun rather than mpiexec

simdir=${1}
nmpi=${2}
exec=${3}
nomp=${4}
numa=${5}
mpinuma=${6}

echo $simdir

cd $simdir

mpirun -env OMP_NUM_THREADS $nomp -np $nmpi $exec