Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIP and MPI+HIP builds broken since adding f-function support (PR #312) #344

Open
ohearnk opened this issue Mar 16, 2024 · 0 comments · May be fixed by #361
Open

HIP and MPI+HIP builds broken since adding f-function support (PR #312) #344

ohearnk opened this issue Mar 16, 2024 · 0 comments · May be fixed by #361
Assignees

Comments

@ohearnk
Copy link
Collaborator

ohearnk commented Mar 16, 2024

#312 broke all HIP and MPI+HIP builds (with and without f-function support).

New CUDA and MPI+CUDA codes need to be backported to respective HIP and MPI+HIP implementations.

My current working notes on this are as follows:

  • Delete all HIP / MPI+HIP sources, and replace with converted CUDA / MPI+CUDA sources using hipify tools
cd QUICK/src
rm hip/*.{cu,h,cpp}
rm -rf hip/iclass && cp -r cuda/iclass hip
for FILE in $(ls *.{cu,h,cpp}); do hipify-perl "${FILE}" -o "../hip/${FILE}"; done
  • Manually fix issues
    -- CUDA_MPIV -> HIP_MPIV
    -- src/hip/gpu.cu:49: debugFile = fopen("debug.cuda", "w+");
    -- NVTX -> ROC-tracer (https://github.com/ROCm/roctracer)
    --- #include "nvToolsExt.h" -> #include "roctx.h"
    --- nvtxRangePushA -> roctxRangePush
    --- nvtxRangePop -> roctxRangePop
    -- HIP kernel tuning: hipLaunchKernelGGL, __attribute__, __launch_bounds__
    --- Q: why static variables? => preprocessor definitions
    -- future proof code for porting by changing CUDA and HIP string prefixes with generic GPU prefixes

Issues:

  • After updating the CMake build system, the following linking error comes up involving XC (on AAC for MI210s):
[ 98%] Linking CXX shared library libquick_hip.so
lld: error: undefined symbol: devSim_dft
>>> referenced by lto.tmp:(get_cshell_density_kernel())
>>> referenced by lto.tmp:(get_cshell_density_kernel())
>>> referenced by lto.tmp:(cshell_getxc_kernel())
>>> referenced 9 more times
clang++: error: amdgcn-link command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [src/CMakeFiles/libquick_hip.dir/build.make:2491: src/libquick_hip.so] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:258: src/CMakeFiles/libquick_hip.dir/all] Error 2
gmake: *** [Makefile:156: all] Error 2
@ohearnk ohearnk self-assigned this Mar 16, 2024
@ohearnk ohearnk moved this to In Progress in QUICK/AMBER 2024 Release Mar 16, 2024
@ohearnk ohearnk linked a pull request Apr 15, 2024 that will close this issue
@ohearnk ohearnk changed the title HIP and MPI+HIP builds broken since adding f-function support (PR #312) Refactor GPU codes and restore HIP Support Nov 7, 2024
@ohearnk ohearnk changed the title Refactor GPU codes and restore HIP Support HIP and MPI+HIP builds broken since adding f-function support (PR #312) Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging a pull request may close this issue.

1 participant