Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tests] Update Dockerfile to use cuda 12.2 #1050

Merged
merged 2 commits into from
Nov 1, 2023
Merged

Conversation

younesbelkada
Copy link
Contributor

An attempt to fix the current example tests being skipped - according to the logs:

./../../../opt/conda/envs/peft/lib/python3.8/site-packages/torch/cuda/__init__.py:138
  /opt/conda/envs/peft/lib/python3.8/site-packages/torch/cuda/__init__.py:138: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 11080). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org/ to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
    return torch._C._cuda_getDeviceCount() > 0

../../../../opt/conda/envs/peft/lib/python3.8/site-packages/bitsandbytes/cextension.py:34
  /opt/conda/envs/peft/lib/python3.8/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
    warn("The installed version of bitsandbytes was compiled without GPU support. "

Will investigate more and report back here

cc @BenjaminBossan

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Oct 25, 2023

The documentation is not available anymore as the PR was closed or merged.

@younesbelkada
Copy link
Contributor Author

After some investigation, that seems to be indeed the culprit, I think torch 2.1 somehow fails with old CUDA versions leading to torch.cuda.is_available() to be set to False. I also had to add NVIDIA_DISABLE_REQUIRE: "1" (ref: NVIDIA/nvidia-container-toolkit#148) so that the docker image can run without the check of nvidia versions

@younesbelkada younesbelkada marked this pull request as ready for review November 1, 2023 09:21
Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for investigating and fixing the issue, LGTM

@younesbelkada younesbelkada merged commit 6960076 into main Nov 1, 2023
@younesbelkada younesbelkada deleted the younesbelkada-patch-2 branch November 7, 2023 09:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants