Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/create: Support passing --device option to podman-create #1407

Closed
wants to merge 1 commit into from

Conversation

Jmennius
Copy link
Collaborator

@Jmennius Jmennius commented Nov 18, 2023

This allows to use CDI infrastructure, which often does more then just mapping devices in /dev -
for NVIDIA this will additionally map a set of libraries into container (which are essential to use the device without a hassle).

Since this is only a pass-through, maybe instead of having a --device specific option we should have an ability to pass arbitrary option to podman-create? Like after toolbox create -c my-container -- --device foo:bar --other-podman-option?

Fixes: #116 (although it is possible to use nvidia devices inside toolboxes - this change improves the usability significantly when using NVIDIA CTK+CDI with toolbox)

Copy link

Build succeeded.
https://softwarefactory-project.io/zuul/t/local/buildset/9b19d48ad9834499acf777678f0d96b3

✔️ unit-test SUCCESS in 8m 11s
✔️ unit-test-migration-path-for-coreos-toolbox SUCCESS in 8m 31s
✔️ unit-test-restricted SUCCESS in 7m 10s
✔️ system-test-fedora-rawhide SUCCESS in 28m 22s
✔️ system-test-fedora-39 SUCCESS in 27m 39s
✔️ system-test-fedora-38 SUCCESS in 27m 28s
✔️ system-test-fedora-37 SUCCESS in 26m 59s

@Jmennius
Copy link
Collaborator Author

Jmennius commented Nov 21, 2023

There was a related issue with nvidia-ctk: NVIDIA/nvidia-container-toolkit#143 and the fix was merged.
Until it is released there is a workaround - the chmod hook can be just removed from nvidia CDI spec file.

UPDATE: The fix was released in v1.15.0, everything work out of the box now!
P.S. nvidia has changed their repos recently, so if you had nvidia-container-toolkit* installed and it is not updating past v1.13.x - remove existing and add a new repo file.

@Jmennius
Copy link
Collaborator Author

@debarshiray can I attract your attention, get some opinions on this? 😁

Copy link
Member

@debarshiray debarshiray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, @Jmennius I finally got myself some NVIDIA hardware to play with this.

I see that the Container Device Interface requires installing the NVIDIA Container Toolkit.

However, as far as I can make out, the nvidia-container-toolkit or nvidia-container-toolkit-base packages are only available from NVIDIA's own repositories right now. For example, I am on Fedora 39, and neither the RPMFusion free nor the non-free repositories have it, but they do have NVIDIA's proprietary driver.

Is there anything else other than NVIDIA that uses the Container Device Interface?

I would like to understand the situation a bit better. Ultimately I want to make it as smooth as possible for the user to enable the NVIDIA proprietary driver. That becomes a problem if one needs to enable multiple different unofficial repositories, at least on Fedora.

@Jmennius
Copy link
Collaborator Author

Sorry for the delay, @Jmennius I finally got myself some NVIDIA hardware to play with this.

I see that the Container Device Interface requires installing the NVIDIA Container Toolkit.

However, as far as I can make out, the nvidia-container-toolkit or nvidia-container-toolkit-base packages are only available from NVIDIA's own repositories right now. For example, I am on Fedora 39, and neither the RPMFusion free nor the non-free repositories have it, but they do have NVIDIA's proprietary driver.

Yeah, you are right - it's only available in Nvidia repos for Fedora. It would be nice if it was repackaged somehow on rpmfusion... I saw that some distroa package it.

Is there anything else other than NVIDIA that uses the Container Device Interface?

I am not aware of other CDI implementations :(

I would like to understand the situation a bit better. Ultimately I want to make it as smooth as possible for the user to enable the NVIDIA proprietary driver. That becomes a problem if one needs to enable multiple different unofficial repositories, at least on Fedora.

I guess this is a way to go for the best experience.
Another option with Nvidia is to basically reinstall Nvidia driver libraries inside the container with each upgrade of the kernel driver on the host.

@AlvaroFS
Copy link

AlvaroFS commented May 8, 2024

Hi! This is great news!! Is it expected to be merged soon or should I grab the patch?

This allows to use CDI infrastructure, which is often does more then
just mapping devices in /dev -
for NVIDIA this will additionally map a set of libraries into container
(which are essential to use the device without a hassle).

Signed-off-by: Ievgen Popovych <[email protected]>
@Jmennius Jmennius force-pushed the support-devices branch from 8cab5e7 to 922babf Compare May 8, 2024 12:00
@Jmennius
Copy link
Collaborator Author

Jmennius commented May 8, 2024

Hi! This is great news!! Is it expected to be merged soon or should I grab the patch?

I'd go for a patch for sure 😉
I've rebased the change.

P.S. You can do this in your ~/.config/containers/toolbox.conf

[general]
devices = ["nvidia.com/gpu=all"]

@Jmennius
Copy link
Collaborator Author

Jmennius commented May 8, 2024

For Silverblue, something that I've come up with to handle regenerating the CDI spec after an upgrade:
/etc/systemd/system/nvidia-cdi-update.service:

[Unit]
Description=Update Nvidia CDI configuration
DefaultDependencies=no
Before=systemd-update-done.service
ConditionNeedsUpdate=/etc

[Service]
Type=oneshot
ExecStart=/usr/bin/nvidia-ctk cdi generate --output /etc/cdi/nvidia.yaml

[Install]
WantedBy=multi-user.target

sudo systemctl enable --now nvidia-cdi-update.service
This will run on the next boot after an update (so not every boot) and write up to date CDI spec.
Not sure if those systemd features work with regular Fedora Workstation though (too lazy to research 😄).

Copy link

Build succeeded.
https://softwarefactory-project.io/zuul/t/local/buildset/7814973e94aa4d73b0d68ab88a881112

✔️ unit-test SUCCESS in 6m 37s
✔️ unit-test-migration-path-for-coreos-toolbox SUCCESS in 3m 29s
✔️ unit-test-restricted SUCCESS in 5m 52s
✔️ system-test-fedora-rawhide SUCCESS in 35m 24s
✔️ system-test-fedora-40 SUCCESS in 34m 03s
✔️ system-test-fedora-39 SUCCESS in 34m 26s
✔️ system-test-fedora-38 SUCCESS in 34m 07s

@debarshiray
Copy link
Member

For Silverblue, something that I've come up with to handle regenerating the CDI spec after an upgrade: /etc/systemd/system/nvidia-cdi-update.service:

That's a really neat hack, indeed. :)

@debarshiray
Copy link
Member

Is there anything else other than NVIDIA that uses the Container Device Interface?

I am not aware of other CDI implementations :(

I see commits from Intel in github.com/cncf-tags/container-device-interface, which is great.

Copy link
Member

@debarshiray debarshiray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks to @Jmennius and @owtaylor I changed my mind about how to enable the proprietary NVIDIA driver in Toolbx containers. Since Intel, NVIDIA and several container tools, including Podman, have embraced the Container Device Interface, it's a better path to take than the unmanaged Flatpak extension approach that I had mentioned before.

However, I want to be a bit careful when using the CDI. The way it's widely advertised requires root privileges, because podman run --device nvidia.com/gpu... expects the CDI file to be present in either /etc/cdi or /var/run/cdi. It's not possible to create the file with nvidia-ctk cdi generate and put it in those locations without root access. It will be good if we could make it work entirely rootless.

One option is to use the Go packages from tags.cncf.io/container-device-interface and github.com/NVIDIA/nvidia-container-toolkit to create the Container Device Interface file ourselves during enter and run, make it available to init-container, and let it parse and apply it when the container starts. The CDI file is ultimately a bunch of environment variables, bind mounts and hooks to call ldconfig(8), so it shouldn't be that hard. Since Toolbx already makes the entire /dev from the host available to the container, we don't need to worry about the devices.

This avoids the need for root privileges, and has the extra benefit of enabling the driver in existing Toolbx containers.

I have a working proof-of-concept using this approach in #1497 that seems to work with the NVIDIA Quadro P600 GPU on my ThinkPad P72 laptop.

@debarshiray
Copy link
Member

Sorry for the delay, @Jmennius I finally got myself some NVIDIA hardware to play with this.
I see that the Container Device Interface requires installing the NVIDIA Container Toolkit.
However, as far as I can make out, the nvidia-container-toolkit or nvidia-container-toolkit-base packages are only available from NVIDIA's own repositories right now. For example, I am on Fedora 39, and neither the RPMFusion free nor the non-free repositories have it, but they do have NVIDIA's proprietary driver.

Yeah, you are right - it's only available in Nvidia repos for Fedora. It would be nice if it was repackaged somehow on rpmfusion... I saw that some distroa package it.

The NVIDIA Container Toolkit code seems to be entirely free software. I wonder if we can get it into Fedora proper, instead of RPMFusion.

@debarshiray
Copy link
Member

Closing in favour of #1497

Thanks for again for pointing me in the right direction, @Jmennius

@debarshiray debarshiray closed this Jun 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable the proprietary NVIDIA driver
3 participants