HIP and MPI+HIP builds broken since adding f-function support (PR #312) #344

ohearnk · 2024-03-16T15:36:39Z

#312 broke all HIP and MPI+HIP builds (with and without f-function support).

New CUDA and MPI+CUDA codes need to be backported to respective HIP and MPI+HIP implementations.

My current working notes on this are as follows:

Delete all HIP / MPI+HIP sources, and replace with converted CUDA / MPI+CUDA sources using hipify tools

cd QUICK/src
rm hip/*.{cu,h,cpp}
rm -rf hip/iclass && cp -r cuda/iclass hip
for FILE in $(ls *.{cu,h,cpp}); do hipify-perl "${FILE}" -o "../hip/${FILE}"; done

Manually fix issues
-- CUDA_MPIV -> HIP_MPIV
-- src/hip/gpu.cu:49: debugFile = fopen("debug.cuda", "w+");
-- NVTX -> ROC-tracer (https://github.com/ROCm/roctracer)
--- #include "nvToolsExt.h" -> #include "roctx.h"
--- nvtxRangePushA -> roctxRangePush
--- nvtxRangePop -> roctxRangePop
-- HIP kernel tuning: hipLaunchKernelGGL, __attribute__, __launch_bounds__
--- Q: why static variables? => preprocessor definitions
-- future proof code for porting by changing CUDA and HIP string prefixes with generic GPU prefixes

Issues:

After updating the CMake build system, the following linking error comes up involving XC (on AAC for MI210s):

[ 98%] Linking CXX shared library libquick_hip.so
lld: error: undefined symbol: devSim_dft
>>> referenced by lto.tmp:(get_cshell_density_kernel())
>>> referenced by lto.tmp:(get_cshell_density_kernel())
>>> referenced by lto.tmp:(cshell_getxc_kernel())
>>> referenced 9 more times
clang++: error: amdgcn-link command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [src/CMakeFiles/libquick_hip.dir/build.make:2491: src/libquick_hip.so] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:258: src/CMakeFiles/libquick_hip.dir/all] Error 2
gmake: *** [Makefile:156: all] Error 2

The text was updated successfully, but these errors were encountered:

ohearnk added this to QUICK/AMBER 2024 Release Mar 16, 2024

ohearnk self-assigned this Mar 16, 2024

ohearnk moved this to In Progress in QUICK/AMBER 2024 Release Mar 16, 2024

ohearnk mentioned this issue Mar 21, 2024

Disable support for HIP and MPI+HIP builds #351

Merged

ohearnk linked a pull request Apr 15, 2024 that will close this issue

HIP and MPI+HIP updates #361

Open

ohearnk changed the title ~~HIP and MPI+HIP builds broken since adding f-function support (PR #312)~~ Refactor GPU codes and restore HIP Support Nov 7, 2024

ohearnk changed the title ~~Refactor GPU codes and restore HIP Support~~ HIP and MPI+HIP builds broken since adding f-function support (PR #312) Nov 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HIP and MPI+HIP builds broken since adding f-function support (PR #312) #344

HIP and MPI+HIP builds broken since adding f-function support (PR #312) #344

ohearnk commented Mar 16, 2024 •

edited

Loading

HIP and MPI+HIP builds broken since adding f-function support (PR #312) #344

HIP and MPI+HIP builds broken since adding f-function support (PR #312) #344

Comments

ohearnk commented Mar 16, 2024 • edited Loading

ohearnk commented Mar 16, 2024 •

edited

Loading