You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#312 broke all HIP and MPI+HIP builds (with and without f-function support).
New CUDA and MPI+CUDA codes need to be backported to respective HIP and MPI+HIP implementations.
My current working notes on this are as follows:
Delete all HIP / MPI+HIP sources, and replace with converted CUDA / MPI+CUDA sources using hipify tools
cd QUICK/src
rm hip/*.{cu,h,cpp}
rm -rf hip/iclass && cp -r cuda/iclass hip
for FILE in $(ls *.{cu,h,cpp}); do hipify-perl "${FILE}" -o "../hip/${FILE}"; done
Manually fix issues
-- CUDA_MPIV -> HIP_MPIV
-- src/hip/gpu.cu:49: debugFile = fopen("debug.cuda", "w+");
-- NVTX -> ROC-tracer (https://github.com/ROCm/roctracer)
--- #include "nvToolsExt.h" -> #include "roctx.h"
--- nvtxRangePushA -> roctxRangePush
--- nvtxRangePop -> roctxRangePop
-- HIP kernel tuning: hipLaunchKernelGGL, __attribute__, __launch_bounds__
--- Q: why static variables? => preprocessor definitions
-- future proof code for porting by changing CUDA and HIP string prefixes with generic GPU prefixes
Issues:
After updating the CMake build system, the following linking error comes up involving XC (on AAC for MI210s):
[ 98%] Linking CXX shared library libquick_hip.so
lld: error: undefined symbol: devSim_dft
>>> referenced by lto.tmp:(get_cshell_density_kernel())
>>> referenced by lto.tmp:(get_cshell_density_kernel())
>>> referenced by lto.tmp:(cshell_getxc_kernel())
>>> referenced 9 more times
clang++: error: amdgcn-link command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [src/CMakeFiles/libquick_hip.dir/build.make:2491: src/libquick_hip.so] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:258: src/CMakeFiles/libquick_hip.dir/all] Error 2
gmake: *** [Makefile:156: all] Error 2
The text was updated successfully, but these errors were encountered:
ohearnk
changed the title
HIP and MPI+HIP builds broken since adding f-function support (PR #312)
Refactor GPU codes and restore HIP Support
Nov 7, 2024
ohearnk
changed the title
Refactor GPU codes and restore HIP Support
HIP and MPI+HIP builds broken since adding f-function support (PR #312)
Nov 7, 2024
#312 broke all HIP and MPI+HIP builds (with and without f-function support).
New CUDA and MPI+CUDA codes need to be backported to respective HIP and MPI+HIP implementations.
My current working notes on this are as follows:
--
CUDA_MPIV
->HIP_MPIV
-- src/hip/gpu.cu:49:
debugFile = fopen("debug.cuda", "w+");
-- NVTX -> ROC-tracer (https://github.com/ROCm/roctracer)
---
#include "nvToolsExt.h"
->#include "roctx.h"
---
nvtxRangePushA
->roctxRangePush
---
nvtxRangePop
->roctxRangePop
-- HIP kernel tuning:
hipLaunchKernelGGL
,__attribute__
,__launch_bounds__
--- Q: why static variables? => preprocessor definitions
-- future proof code for porting by changing CUDA and HIP string prefixes with generic GPU prefixes
Issues:
The text was updated successfully, but these errors were encountered: