-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-GPU failed in eigh
kernel
#316
Comments
There may be multiple libcusolver.so libraries on the system. I suspect an incorrect version is loaded. Could you run
to check which libcusolver is used. |
Here's the output of the command above. I cannot decipher anything, but maybe you can 😁 565750: find library=libpthread.so.0 [0]; searching
565750: search path=/opt/shared/conda/envs/tstoolkit-dev3/bin/../lib/glibc-hwcaps/x86-64-v3:/opt/shared/conda/envs/tstoolkit-dev3/bin/../lib/glibc-hwcaps/x86-64-v2:/opt/shared/conda/envs/tstoolkit-dev3/bin/../lib/tls/x86_64/x86_64:/opt/shared/conda/envs/tstoolkit-dev3/bin/../lib/tls/x86_64:/opt/shared/conda/envs/tstoolkit-dev3/bin/../lib/tls/x86_64:/opt/shared/conda/envs/tstoolkit-dev3/bin/../lib/tls:/opt/shared/conda/envs/tstoolkit-dev3/bin/../lib/x86_64/x86_64:/opt/shared/conda/envs/tstoolkit-dev3/bin/../lib/x86_64:/opt/shared/conda/envs/tstoolkit-dev3/bin/../lib/x86_64:/opt/shared/conda/envs/tstoolkit-dev3/bin/../lib (RPATH from file python)
565750: trying file=/opt/shared/conda/envs/tstoolkit-dev3/bin/../lib/glibc-hwcaps/x86-64-v3/libpthread.so.0
565750: trying file=/opt/shared/conda/envs/tstoolkit-dev3/bin/../lib/glibc-hwcaps/x86-64-v2/libpthread.so.0
565750: trying file=/opt/shared/conda/envs/tstoolkit-dev3/bin/../lib/tls/x86_64/x86_64/libpthread.so.0
565750: trying file=/opt/shared/conda/envs/tstoolkit-dev3/bin/../lib/tls/x86_64/libpthread.so.0
565750: trying file=/opt/shared/conda/envs/tstoolkit-dev3/bin/../lib/tls/x86_64/libpthread.so.0
565750: trying file=/opt/shared/conda/envs/tstoolkit-dev3/bin/../lib/tls/libpthread.so.0
565750: trying file=/opt/shared/conda/envs/tstoolkit-dev3/bin/../lib/x86_64/x86_64/libpthread.so.0
565750: trying file=/opt/shared/conda/envs/tstoolkit-dev3/bin/../lib/x86_64/libpthread.so.0
565750: trying file=/opt/shared/conda/envs/tstoolkit-dev3/bin/../lib/x86_64/libpthread.so.0
565750: trying file=/opt/shared/conda/envs/tstoolkit-dev3/bin/../lib/libpthread.so.0
565750: search cache=/etc/ld.so.cache
565750: trying file=/lib64/libpthread.so.0
565750:
565750: find library=libdl.so.2 [0]; searching
565750: search path=/opt/shared/conda/envs/tstoolkit-dev3/bin/../lib (RPATH from file python)
565750: trying file=/opt/shared/conda/envs/tstoolkit-dev3/bin/../lib/libdl.so.2
565750: search cache=/etc/ld.so.cache
565750: trying file=/lib64/libdl.so.2
565750:
565750: find library=libutil.so.1 [0]; searching
565750: search path=/opt/shared/conda/envs/tstoolkit-dev3/bin/../lib (RPATH from file python)
565750: trying file=/opt/shared/conda/envs/tstoolkit-dev3/bin/../lib/libutil.so.1
565750: search cache=/etc/ld.so.cache
565750: trying file=/lib64/libutil.so.1
565750:
565750: find library=libm.so.6 [0]; searching
565750: search path=/opt/shared/conda/envs/tstoolkit-dev3/bin/../lib (RPATH from file python)
565750: trying file=/opt/shared/conda/envs/tstoolkit-dev3/bin/../lib/libm.so.6
565750: search cache=/etc/ld.so.cache
565750: trying file=/lib64/libm.so.6
565750:
565750: find library=libc.so.6 [0]; searching
565750: search path=/opt/shared/conda/envs/tstoolkit-dev3/bin/../lib (RPATH from file python)
565750: trying file=/opt/shared/conda/envs/tstoolkit-dev3/bin/../lib/libc.so.6
565750: search cache=/etc/ld.so.cache
565750: trying file=/lib64/libc.so.6
565750:
565750:
565750: calling init: /lib64/ld-linux-x86-64.so.2
565750:
565750:
565750: calling init: /lib64/libc.so.6
565750:
565750:
565750: calling init: /lib64/libm.so.6
565750:
565750:
565750: calling init: /lib64/libutil.so.1
565750:
565750:
565750: calling init: /lib64/libdl.so.2
565750:
565750:
565750: calling init: /lib64/libpthread.so.0
565750:
565750:
565750: initialize program: python
565750:
565750:
565750: transferring control: python
565750:
565750: find library=libffi.so.8 [0]; searching
565750: search path=/opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/../../glibc-hwcaps/x86-64-v3:/opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/../../glibc-hwcaps/x86-64-v2:/opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/../../tls/x86_64/x86_64:/opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/../../tls/x86_64:/opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/../../tls/x86_64:/opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/../../tls:/opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/../../x86_64/x86_64:/opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/../../x86_64:/opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/../../x86_64:/opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/../.. (RPATH from file /opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/_ctypes.cpython-311-x86_64-linux-gnu.so)
565750: trying file=/opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/../../glibc-hwcaps/x86-64-v3/libffi.so.8
565750: trying file=/opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/../../glibc-hwcaps/x86-64-v2/libffi.so.8
565750: trying file=/opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/../../tls/x86_64/x86_64/libffi.so.8
565750: trying file=/opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/../../tls/x86_64/libffi.so.8
565750: trying file=/opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/../../tls/x86_64/libffi.so.8
565750: trying file=/opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/../../tls/libffi.so.8
565750: trying file=/opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/../../x86_64/x86_64/libffi.so.8
565750: trying file=/opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/../../x86_64/libffi.so.8
565750: trying file=/opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/../../x86_64/libffi.so.8
565750: trying file=/opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/../../libffi.so.8
565750:
565750:
565750: calling init: /opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/../../libffi.so.8
565750:
565750:
565750: calling init: /opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/_ctypes.cpython-311-x86_64-linux-gnu.so
565750:
565750:
565750: calling init: /opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/_struct.cpython-311-x86_64-linux-gnu.so
565750:
565750: find library=libcusolver.so [0]; searching
565750: search path=/opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/../.. (RPATH from file /opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/_ctypes.cpython-311-x86_64-linux-gnu.so)
565750: trying file=/opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/../../libcusolver.so
565750:
565750: find library=libcublas.so.11 [0]; searching
565750: search path=/opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/../.. (RPATH from file /opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/_ctypes.cpython-311-x86_64-linux-gnu.so)
565750: trying file=/opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/../../libcublas.so.11
565750:
565750: find library=libcublasLt.so.11 [0]; searching
565750: search path=/opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/../.. (RPATH from file /opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/_ctypes.cpython-311-x86_64-linux-gnu.so)
565750: trying file=/opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/../../libcublasLt.so.11
565750:
565750: find library=librt.so.1 [0]; searching
565750: search path=/opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/../.. (RPATH from file /opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/_ctypes.cpython-311-x86_64-linux-gnu.so)
565750: trying file=/opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/../../librt.so.1
565750: search cache=/etc/ld.so.cache
565750: trying file=/lib64/librt.so.1
565750:
565750: find library=libgcc_s.so.1 [0]; searching
565750: search path=/opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/../.. (RPATH from file /opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/_ctypes.cpython-311-x86_64-linux-gnu.so)
565750: trying file=/opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/../../libgcc_s.so.1
565750:
565750:
565750: calling init: /opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/../../libgcc_s.so.1
565750:
565750:
565750: calling init: /lib64/librt.so.1
565750:
565750:
565750: calling init: /opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/../../libcublasLt.so.11
565750:
565750: find library=libcuda.so.1 [0]; searching
565750: search path=/opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/../.. (RPATH from file /opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/_ctypes.cpython-311-x86_64-linux-gnu.so)
565750: trying file=/opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/../../libcuda.so.1
565750: search cache=/etc/ld.so.cache
565750: trying file=/lib64/libcuda.so.1
565750:
565750:
565750: calling init: /lib64/libcuda.so.1
565750:
565750: find library=libnvrtc.so [0]; searching
565750: search path=/opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/../.. (RPATH from file /opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/_ctypes.cpython-311-x86_64-linux-gnu.so)
565750: trying file=/opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/../../libnvrtc.so
565750:
565750:
565750: calling init: /opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/../../libnvrtc.so
565750:
565750:
565750: calling init: /opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/../../libcublas.so.11
565750:
565750:
565750: calling init: /opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/../../libcusolver.so
565750:
565750:
565750: calling fini: /opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/../../libnvrtc.so [0]
565750:
565750:
565750: calling fini: /lib64/libcuda.so.1 [0]
565750:
565750:
565750: calling fini: python [0]
565750:
565750:
565750: calling fini: /lib64/libutil.so.1 [0]
565750:
565750:
565750: calling fini: /opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/_ctypes.cpython-311-x86_64-linux-gnu.so [0]
565750:
565750:
565750: calling fini: /opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/../../libffi.so.8 [0]
565750:
565750:
565750: calling fini: /opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/_struct.cpython-311-x86_64-linux-gnu.so [0]
565750:
565750:
565750: calling fini: /opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/../../libcusolver.so [0]
565750:
565750:
565750: calling fini: /opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/../../libcublas.so.11 [0]
565750:
565750:
565750: calling fini: /opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/../../libcublasLt.so.11 [0]
565750:
565750:
565750: calling fini: /lib64/libm.so.6 [0]
565750:
565750:
565750: calling fini: /lib64/libdl.so.2 [0]
565750:
565750:
565750: calling fini: /lib64/libpthread.so.0 [0]
565750:
565750:
565750: calling fini: /lib64/librt.so.1 [0]
565750:
565750:
565750: calling fini: /opt/shared/conda/envs/tstoolkit-dev3/lib/python3.11/lib-dynload/../../libgcc_s.so.1 [0]
565750: |
@cvsik Does the following minimal script work on your side?
|
@wxj6000 Yes, the minimal script works fine both on a single and two GPUs. |
Interestingly, sometimes there's no Here's the |
Hi everyone,
I'm trying to run
gpu4pyscf 1.3.0
(installed from PyPI) on 2 GPUs, but I get an error when the initial Fock matrix is formed:The calculation runs fine on a single GPU (same node and environment).
Here is the output of
lib.utils.format_sys_info()
in case it helps:System Information
Any help on how to debug or fix this are greatly appreciated!
Thanks a lot in advance 🚀
Best,
Max
The text was updated successfully, but these errors were encountered: