WIP: Add ramalama image built on Fedora using Fedora's rocm packages #596

maxamillion · 2025-01-16T18:18:28Z

Add ramalama image built on Fedora using Fedora's rocm packages, this enables more models of embedded GPUs in the Ryzen APUs for more series of gfx than is available in the official ROCm packages from AMD that are used in the UBI based default images.

rhatdan · 2025-01-16T18:24:27Z

Do we actually need a separate Containerfile or could we just do this with a podman build --from fedora ... or I guess to make it more container engine agnostic use BUILDARGS.

ericcurtin · 2025-01-16T18:24:41Z

container-images/ramalama-fedora/Containerfile

@@ -0,0 +1,8 @@
+FROM registry.fedoraproject.org/fedora:latest
+
+COPY ../scripts /scripts


I know it's not obvious apologies, but "ramalama" means use a generic driver, aka vulkan/kompute. I think what we could do here is:

/scripts/build_llama_and_whisper.sh "rocm" "fedora"

with the first parameter being the GPU/AI driver (rocm in this case) and the second parameter could be fedora, ubi, etc.

maxamillion · 2025-01-16T19:18:02Z

Do we actually need a separate Containerfile or could we just do this with a podman build --from fedora ... or I guess to make it more container engine agnostic use BUILDARGS.

That's probably a better option, I'll go that route.

ericcurtin · 2025-01-16T19:24:05Z

container-images/scripts/build_llama_and_whisper.sh

@@ -36,7 +36,11 @@ dnf_install() {
    dnf install -y asahi-repos
    dnf install -y mesa-vulkan-drivers "${vulkan_rpms[@]}" "${rpm_list[@]}"
  elif [ "$containerfile" = "rocm" ]; then
-    dnf install -y rocm-dev hipblas-devel rocblas-devel
+    if [ "${ID}" = "fedora" ]; then


Much nicer :)

ericcurtin · 2025-01-16T19:25:14Z

container-images/rocm-fedora/Containerfile

@@ -1,8 +1,10 @@
 FROM registry.fedoraproject.org/fedora:latest

+RUN dnf update -y --refresh && dnf clean all


This step doesn't seem unique to this Containerfile or anything, no big deal though

ericcurtin

Merging on green build

maxamillion · 2025-01-16T19:36:35Z

@ericcurtin would you prefer I converge the two rocm Containerfiles and use --build-args as @rhatdan suggested or would that be a separate PR?

ericcurtin · 2025-01-16T20:01:09Z

@maxamillion I don't mind, if it works, you'll probably have to edit container_build.sh for that technique

ericcurtin · 2025-01-16T21:12:53Z

The build failed because the container image is too large, we probably want to remove some older GPUs from it, for example on the UBI container images we rm'd *gfx9* files to trim the image, these are GPUs prior to ~2018

ericcurtin · 2025-01-16T21:43:52Z

@debarshiray and @owtaylor I remember you guys wanted some Fedora based RamaLama images. Well... It's coming it seems 😄 Thanks to @maxamillion

ericcurtin · 2025-01-17T16:13:35Z

container-images/scripts/build_llama_and_whisper.sh

@@ -115,7 +124,8 @@ main() {
  dnf clean all
  rm -rf /var/cache/*dnf* /opt/rocm-*/lib/llvm \
    /opt/rocm-*/lib/rocblas/library/*gfx9* \
-    /opt/rocm-*/lib/hipblaslt/library/*gfx9* llama.cpp whisper.cpp
+    /opt/rocm-*/lib/hipblaslt/library/*gfx9* llama.cpp whisper.cpp \
+    /usr/lib64/rocm/gfx9* 


This might make it small enough :)

It's actually a restriction on the storage space available on our github runner, but it's a welcome restriction, we don't want 20GB container images

One could always contribute a rocm-legacy-fedora type image for gfx9 if they had the motivation to do so.

We should actually simplify the above line to:

/opt/rocm-*/lib/*/library/*gfx9*

not necessary in this PR or anything.

we'll find out if it's small enough soon! 😄

@ericcurtin it appears that the build is still angry about it 🤔

Yeah I guess you just gotta "podman run --rm -it bash" into it and look around and see what's big, gfx9 was enough to remove for the UBI versions. Trim, test to make sure you didn't break the ROCm support, then add the "rm -fr" here.

@ericcurtin question - what's the size goal I'm trying to hit?

I can't remember exactly, these ones pushed here are built in this CI and are within the limit (https://quay.io/repository/ramalama/rocm?tab=tags). The ROCm images are the only images that suffer from this, I've never had to trim a CUDA one for example.

It's arguably worth logging an issue in ROCm upstream, it's unlikely all those *gfx* files we trim are completely unique, I'm sure there's loads of duplicate data in those files that could be shared using some common files or some other form of de-duplication.

It leaves a bad first impression for ramalama if users have to pull huge container images just to get going, some people won't wait around for hours for a large image to be pulled.

We could also do container images like rocm-gfx9-fedora, rocm-gfx10-fedora, etc. We may be able to detect what generation we are pre-pull

ericcurtin · 2025-01-17T20:03:37Z

UBI has nothing lower than gfx9 @maxamillion I see you found gfx8 . If there's gfx7 or less, likely worth removing also.

maxamillion · 2025-01-17T20:18:45Z

@ericcurtin nothing older than gfx8, but it's still a lot of space (granted, it was 14G before I pulled both gfx9* and gfx8* out)

[root@a854647ecf79 /]# ls /usr/lib64/rocm
gfx10  gfx11  gfx1100  gfx1103
[root@a854647ecf79 /]# du -sh /usr/lib64/rocm/
5.3G    /usr/lib64/rocm/

ericcurtin · 2025-01-17T20:25:32Z

Looks like you are almost there, if only we could rank the GPU models by units sold or something and trim based on least units sold.

ericcurtin · 2025-01-17T20:27:33Z

Could also be interesting to see if there is further compression we can do other than whatever podman build does by default. @rhatdan any compression techniques, etc. we could do off the top of your head?

ericcurtin · 2025-01-17T20:29:25Z

gfx10 seems like the first models were released in 2019 FWIW

ericcurtin · 2025-01-17T20:31:27Z

Maybe we will be lucky and this will be small enough to go green

ericcurtin · 2025-01-17T20:34:54Z

Seperate images for gfx8 gfx9 gfx10 gfx11 would be a welcome addition to be honest, it would improve UX, faster pulls. We may be able to detect which image we need to pull on a given machine by querying /proc /sys /dev etc. Pre-pull. Also any image with < 1GB VRAM has incredibly low value in enabling, often CPU inferencing is better when VRAM is this low (maybe even < 2GB, 4GB, 8GB VRAM). An 8 GB VRAM GPU is considered small these days.

maxamillion · 2025-01-18T01:34:05Z

well bummer ... even splitting them all out the builds are too large 🤔

ericcurtin · 2025-01-19T19:17:54Z

We could just use the exact same list of GPUs as the UBI9 images and enable additional ones on demand as people request.

maxamillion · 2025-01-22T16:27:38Z

@ericcurtin I don't see a way to trim this down further, I'm open to suggestion though.

This is some sample output from the rocm-fedora-gfx11 image.

[root@167cf24d485c /]# du -sh /usr/* | grep G
8.6G    /usr/lib64

[root@167cf24d485c /]# du -sh /usr/lib64/* | grep G
1.1G    /usr/lib64/librocblas.so.4.2
1.9G    /usr/lib64/librocsparse.so.1.0
3.0G    /usr/lib64/rocm

[root@167cf24d485c /]# du -sh /usr/lib64/rocm/* | grep G
2.3G    /usr/lib64/rocm/gfx11

[root@167cf24d485c /]# du -sh /usr/lib64/rocm/gfx11/* | grep G
2.3G    /usr/lib64/rocm/gfx11/lib

[root@167cf24d485c /]# du -sh /usr/lib64/rocm/gfx11/lib/*
32K     /usr/lib64/rocm/gfx11/lib/cmake
4.0K    /usr/lib64/rocm/gfx11/lib/libhipblas.so
4.0K    /usr/lib64/rocm/gfx11/lib/libhipblas.so.2
996K    /usr/lib64/rocm/gfx11/lib/libhipblas.so.2.2
4.0K    /usr/lib64/rocm/gfx11/lib/librocblas.so
4.0K    /usr/lib64/rocm/gfx11/lib/librocblas.so.4
486M    /usr/lib64/rocm/gfx11/lib/librocblas.so.4.2
4.0K    /usr/lib64/rocm/gfx11/lib/librocsolver.so.0
359M    /usr/lib64/rocm/gfx11/lib/librocsolver.so.0.2
4.0K    /usr/lib64/rocm/gfx11/lib/librocsparse.so.1
830M    /usr/lib64/rocm/gfx11/lib/librocsparse.so.1.0
589M    /usr/lib64/rocm/gfx11/lib/rocblas

Based on the output, there's not really anything in there we can rip out based on specific GPU models. It's all just bundled into giant shared object files 😕

maxamillion · 2025-01-22T16:30:33Z

@ericcurtin in a shocking turn of events, if I set it to rawhide as a base image the rocm-fedora-gf11 image is about 4gb smaller than if I build with Fedora 41 🤷 .... any reservations with switching everything to rawhide for now?

maxamillion · 2025-01-22T16:31:58Z

@ericcurtin in a shocking turn of events, if I set it to rawhide as a base image the rocm-fedora-gf11 image is about 4gb smaller than if I build with Fedora 41 🤷 .... any reservations with switching everything to rawhide for now?

lol nvm .. it's smaller because cmake failed 🤦

rhatdan · 2025-01-22T16:32:06Z

I am fine with that, do you know why it is smaller?

maxamillion · 2025-01-22T16:33:46Z

I am fine with that, do you know why it is smaller?

because cmake failed 🤦

ericcurtin · 2025-01-22T17:52:10Z

No issue with rawhide, eventually we can call it Fedora 42

ericcurtin · 2025-01-22T17:57:01Z

This is an isolated image, so that helps

ericcurtin

LGTM, hopefully we can find a way of figuring out which gfx generation we are in a follow on PR

maxamillion · 2025-01-28T18:03:51Z

LGTM, hopefully we can find a way of figuring out which gfx generation we are in a follow on PR

I still need to fix one last nagging issue with the rocblas path and then I'll ask for a merge. 🙂

ericcurtin · 2025-01-28T22:28:42Z

rocm-fedora-gfx10 container image is too large from my reading of the build logs

debarshiray · 2025-02-04T19:51:02Z

@debarshiray and @owtaylor I remember you guys wanted some Fedora based RamaLama images. Well... It's coming it seems 😄 Thanks to @maxamillion

This is fantastic.

While reading through the comments, I saw that you were fighting against the size of the ROCm images because of those different gfx files that ROCm has. I was talking to Kevin Martin about those in the context of Fedora's llama-cpp package. The moment I turned on the GPU accelerated HIP code that gets built with the ROCm compiler, the build time and the size of the RPMs exploded, because the HIP sources were being separately compiled for every different GPU. I was told that it's inherent to how AMD's design works and is a noticeable difference from NVIDIA, and that AMD is aware of it.

The idea of separate ROCm images for different AMD GPUs does sound tempting to me. I wonder if we should split things like llama-cpp similarly.

Finally, if you are looking for the list of GPUs enabled in Fedora's ROCm, then look at ROCM_GPUS in rocm-rpm-macros.

ericcurtin · 2025-02-04T19:54:11Z

It does look like poor design, it's hard to believe each of those files are so unique, but that's an AMD problem to solve I guess...

debarshiray · 2025-02-04T19:56:52Z

LGTM, hopefully we can find a way of figuring out which gfx generation we are in a follow on PR

One can do:

$ rocminfo | grep ' gfx' | cut --delimiter ':' --fields 2 | sed 's/ //g'
gfx1031

... but I suppose you want to do it in Python with saner APIs. :)

debarshiray · 2025-02-04T19:59:22Z

It does look like poor design, it's hard to believe each of those files are so unique, but that's an AMD problem to solve I guess...

I don't know all the details. Kevin told me that doing it this way, without a deeper redesign, improves performance by a few percentage points and AMD doesn't want to let that go.

ericcurtin · 2025-02-04T20:04:02Z

LGTM, hopefully we can find a way of figuring out which gfx generation we are in a follow on PR

One can do:
$ rocminfo | grep ' gfx' | cut --delimiter ':' --fields 2 | sed 's/ //g'
gfx1031
... but I suppose you want to do it in Python with saner APIs. :)

It would be nice if we could avoid the need for rocminfo to be installed on the host system, feedback from a user:

he was impressed how he had to install basically nothing and it worked (and he's right, it's how I use it).

ericcurtin · 2025-02-04T20:05:31Z

rocminfo is open source AFAIK, so we may actually be able to figure out the piece of code where it gets that gfx data from

debarshiray · 2025-02-04T20:16:41Z

LGTM, hopefully we can find a way of figuring out which gfx generation we are in a follow on PR

One can do:
$ rocminfo | grep ' gfx' | cut --delimiter ':' --fields 2 | sed 's/ //g'
gfx1031
... but I suppose you want to do it in Python with saner APIs. :)

This is slightly better:

$ rocm_agent_enumerator -name
gfx1031

... and rocm_agent_enumerator is a one file Python program. :)

ericcurtin · 2025-02-04T20:23:58Z

Great find @debarshiray ! We also have this problem (which is accounted for in some parts of RamaLama), my machine returns this:

gfx1102
gfx90c

one of those amd gpus it not suitable for inferencing, it's the integrated graphics, it has tiny VRAM and one is the 1102, it has 8G VRAM. Just something to keep in mind to take into account

ericcurtin · 2025-02-04T20:28:58Z

readFromKFD() function is basically the important part of that script.

debarshiray · 2025-02-04T20:31:28Z

Reading the rocm_agent_enumerator code, it seems that a decent first pass would be to try the approach taken by readFromKFD() and fallback to readFromLSPCI().

We also have this problem (which is accounted for in some parts of RamaLama), my machine returns this:
gfx1102
gfx90c
one of those amd gpus it not suitable for inferencing, it's the integrated graphics, it has tiny VRAM and one does the 1102, it hass 8G VRAM. Just something to keep in mind to take into account

What does the properties file say inside the sub-directories under /sys/class/kfd/kfd/topology/nodes? That's what readFromKFD() looks at.

ericcurtin · 2025-02-04T20:52:32Z

$ grep gfx /sys/class/kfd/kfd/topology/nodes/*/properties
/sys/class/kfd/kfd/topology/nodes/0/properties:gfx_target_version 0
/sys/class/kfd/kfd/topology/nodes/1/properties:gfx_target_version 110002
/sys/class/kfd/kfd/topology/nodes/2/properties:gfx_target_version 90012

there's a little decimal to hex conversion to get "02" and "0c"

ericcurtin · 2025-02-04T20:53:51Z

Could probably boil it down to ~10 lines of python if we need to

debarshiray · 2025-02-04T20:56:25Z

$ grep gfx /sys/class/kfd/kfd/topology/nodes/*/properties
/sys/class/kfd/kfd/topology/nodes/0/properties:gfx_target_version 0
/sys/class/kfd/kfd/topology/nodes/1/properties:gfx_target_version 110002
/sys/class/kfd/kfd/topology/nodes/2/properties:gfx_target_version 90012

there's a little decimal to hex conversion to get "02" and "0c"

Yes, it's:

major_ver = int((device_id / 10000) % 100)
minor_ver = int((device_id / 100) % 100)
stepping_ver = int(device_id % 100)

... and then:

"gfx" + format(major_ver, 'd') + format(minor_ver, 'x') + format(stepping_ver, 'x')

Did you figure out how to filter out your smaller GPU?

debarshiray · 2025-02-04T21:30:33Z

Did you figure out how to filter out your smaller GPU?

Skimming through the AMDKFD code in the Linux kernel gave me some ideas, particularly kfd_debug_print_topology().

Both cpu_cores_count and simd_count will be set to non-zero values for accelerated processing units (or APUs) in the /sys/class/kfd/kfd/topology/nodes/*/properties file. I have a desktop with an AMD CPU and GPU, not a laptop, so I don't have an AMD APU to verify this, but the rest of the logic does hold for my hardware.

ericcurtin · 2025-02-04T21:36:13Z

$ grep gfx /sys/class/kfd/kfd/topology/nodes/*/properties
/sys/class/kfd/kfd/topology/nodes/0/properties:gfx_target_version 0
/sys/class/kfd/kfd/topology/nodes/1/properties:gfx_target_version 110002
/sys/class/kfd/kfd/topology/nodes/2/properties:gfx_target_version 90012

there's a little decimal to hex conversion to get "02" and "0c"

Yes, it's:

major_ver = int((device_id / 10000) % 100)
minor_ver = int((device_id / 100) % 100)
stepping_ver = int(device_id % 100)

... and then:

"gfx" + format(major_ver, 'd') + format(minor_ver, 'x') + format(stepping_ver, 'x')

Did you figure out how to filter out your smaller GPU?

Yes it's code elsewhere in RamaLama:

    # ROCm/AMD CASE
    i = 0
    gpu_num = 0
    gpu_bytes = 0
    for fp in sorted(glob.glob('/sys/bus/pci/devices/*/mem_info_vram_total')):
        with open(fp, 'r') as file:
            content = int(file.read())
            if content > 1073741824 and content > gpu_bytes:
                gpu_bytes = content
                gpu_num = i

        i += 1

basically ignore every GPU with less than 1GB VRAM and choose the one with the most VRAM.

Signed-off-by: Adam Miller <[email protected]>

maxamillion mentioned this pull request Jan 16, 2025

Compilation errors trying to build a ramalama fedora image #594

Closed

ericcurtin reviewed Jan 16, 2025

View reviewed changes

ericcurtin approved these changes Jan 16, 2025

View reviewed changes

ericcurtin reviewed Jan 17, 2025

View reviewed changes

maxamillion changed the title ~~Add ramalama image built on Fedora using Fedora's rocm packages~~ WIP: Add ramalama image built on Fedora using Fedora's rocm packages Jan 21, 2025

maxamillion force-pushed the fedora-rocm branch from 42a89f7 to b28db8c Compare January 21, 2025 22:58

maxamillion force-pushed the fedora-rocm branch from 423541d to f90a99c Compare January 28, 2025 15:33

ericcurtin approved these changes Jan 28, 2025

View reviewed changes

Fedora rocm images

5c16714

Signed-off-by: Adam Miller <[email protected]>

maxamillion force-pushed the fedora-rocm branch from 42f52ad to 5c16714 Compare February 4, 2025 22:41

maxamillion requested review from bmahabirbu, swarajpande5, jhjaggars, cgruver, slp and engelmi as code owners February 4, 2025 22:41

update to handle vulkan/blas packages for fedora

5f11a70

Signed-off-by: Adam Miller <[email protected]>

		@@ -0,0 +1,8 @@
		FROM registry.fedoraproject.org/fedora:latest

		COPY ../scripts /scripts

		@@ -1,8 +1,10 @@
		FROM registry.fedoraproject.org/fedora:latest

		RUN dnf update -y --refresh && dnf clean all

WIP: Add ramalama image built on Fedora using Fedora's rocm packages #596

Are you sure you want to change the base?

WIP: Add ramalama image built on Fedora using Fedora's rocm packages #596

Conversation

maxamillion commented Jan 16, 2025

rhatdan commented Jan 16, 2025

ericcurtin Jan 16, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maxamillion commented Jan 16, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ericcurtin left a comment

Choose a reason for hiding this comment

maxamillion commented Jan 16, 2025

ericcurtin commented Jan 16, 2025

ericcurtin commented Jan 16, 2025 • edited Loading

ericcurtin commented Jan 16, 2025 • edited Loading

ericcurtin Jan 17, 2025 • edited Loading

Choose a reason for hiding this comment

ericcurtin Jan 17, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ericcurtin Jan 17, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ericcurtin Jan 17, 2025 • edited Loading

Choose a reason for hiding this comment

ericcurtin Jan 17, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ericcurtin commented Jan 17, 2025 • edited Loading

maxamillion commented Jan 17, 2025

ericcurtin commented Jan 17, 2025

ericcurtin commented Jan 17, 2025 • edited Loading

ericcurtin commented Jan 17, 2025

ericcurtin commented Jan 17, 2025

ericcurtin commented Jan 17, 2025 • edited Loading

maxamillion commented Jan 18, 2025

ericcurtin commented Jan 19, 2025

maxamillion commented Jan 22, 2025

maxamillion commented Jan 22, 2025

maxamillion commented Jan 22, 2025

rhatdan commented Jan 22, 2025

maxamillion commented Jan 22, 2025

ericcurtin commented Jan 22, 2025

ericcurtin commented Jan 22, 2025

ericcurtin left a comment

Choose a reason for hiding this comment

maxamillion commented Jan 28, 2025

ericcurtin commented Jan 28, 2025

debarshiray commented Feb 4, 2025

ericcurtin commented Feb 4, 2025 • edited Loading

debarshiray commented Feb 4, 2025

debarshiray commented Feb 4, 2025

ericcurtin commented Feb 4, 2025 • edited Loading

ericcurtin commented Feb 4, 2025

debarshiray commented Feb 4, 2025

ericcurtin commented Feb 4, 2025 • edited Loading

ericcurtin commented Feb 4, 2025

debarshiray commented Feb 4, 2025

ericcurtin commented Feb 4, 2025 • edited Loading

ericcurtin commented Feb 4, 2025

debarshiray commented Feb 4, 2025

debarshiray commented Feb 4, 2025

ericcurtin commented Feb 4, 2025 • edited Loading

ericcurtin Jan 16, 2025 •

edited

Loading

ericcurtin commented Jan 16, 2025 •

edited

Loading

ericcurtin commented Jan 16, 2025 •

edited

Loading

ericcurtin Jan 17, 2025 •

edited

Loading

ericcurtin Jan 17, 2025 •

edited

Loading

ericcurtin Jan 17, 2025 •

edited

Loading

ericcurtin Jan 17, 2025 •

edited

Loading

ericcurtin Jan 17, 2025 •

edited

Loading

ericcurtin commented Jan 17, 2025 •

edited

Loading

ericcurtin commented Jan 17, 2025 •

edited

Loading

ericcurtin commented Jan 17, 2025 •

edited

Loading

ericcurtin commented Feb 4, 2025 •

edited

Loading

ericcurtin commented Feb 4, 2025 •

edited

Loading

ericcurtin commented Feb 4, 2025 •

edited

Loading

ericcurtin commented Feb 4, 2025 •

edited

Loading

ericcurtin commented Feb 4, 2025 •

edited

Loading