-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Add ramalama image built on Fedora using Fedora's rocm packages #596
base: main
Are you sure you want to change the base?
Conversation
Do we actually need a separate Containerfile or could we just do this with a podman build --from fedora ... or I guess to make it more container engine agnostic use BUILDARGS. |
@@ -0,0 +1,8 @@ | |||
FROM registry.fedoraproject.org/fedora:latest | |||
|
|||
COPY ../scripts /scripts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it's not obvious apologies, but "ramalama" means use a generic driver, aka vulkan/kompute. I think what we could do here is:
/scripts/build_llama_and_whisper.sh "rocm" "fedora"
with the first parameter being the GPU/AI driver (rocm in this case) and the second parameter could be fedora, ubi, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
That's probably a better option, I'll go that route. |
@@ -36,7 +36,11 @@ dnf_install() { | |||
dnf install -y asahi-repos | |||
dnf install -y mesa-vulkan-drivers "${vulkan_rpms[@]}" "${rpm_list[@]}" | |||
elif [ "$containerfile" = "rocm" ]; then | |||
dnf install -y rocm-dev hipblas-devel rocblas-devel | |||
if [ "${ID}" = "fedora" ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much nicer :)
@@ -1,8 +1,10 @@ | |||
FROM registry.fedoraproject.org/fedora:latest | |||
|
|||
RUN dnf update -y --refresh && dnf clean all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This step doesn't seem unique to this Containerfile or anything, no big deal though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merging on green build
@ericcurtin would you prefer I converge the two |
@maxamillion I don't mind, if it works, you'll probably have to edit container_build.sh for that technique |
The build failed because the container image is too large, we probably want to remove some older GPUs from it, for example on the UBI container images we rm'd |
@debarshiray and @owtaylor I remember you guys wanted some Fedora based RamaLama images. Well... It's coming it seems 😄 Thanks to @maxamillion |
@@ -115,7 +124,8 @@ main() { | |||
dnf clean all | |||
rm -rf /var/cache/*dnf* /opt/rocm-*/lib/llvm \ | |||
/opt/rocm-*/lib/rocblas/library/*gfx9* \ | |||
/opt/rocm-*/lib/hipblaslt/library/*gfx9* llama.cpp whisper.cpp | |||
/opt/rocm-*/lib/hipblaslt/library/*gfx9* llama.cpp whisper.cpp \ | |||
/usr/lib64/rocm/gfx9* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might make it small enough :)
It's actually a restriction on the storage space available on our github runner, but it's a welcome restriction, we don't want 20GB container images
One could always contribute a rocm-legacy-fedora type image for gfx9 if they had the motivation to do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should actually simplify the above line to:
/opt/rocm-*/lib/*/library/*gfx9*
not necessary in this PR or anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we'll find out if it's small enough soon! 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ericcurtin it appears that the build is still angry about it 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I guess you just gotta "podman run --rm -it bash" into it and look around and see what's big, gfx9 was enough to remove for the UBI versions. Trim, test to make sure you didn't break the ROCm support, then add the "rm -fr" here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ericcurtin question - what's the size goal I'm trying to hit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't remember exactly, these ones pushed here are built in this CI and are within the limit (https://quay.io/repository/ramalama/rocm?tab=tags). The ROCm images are the only images that suffer from this, I've never had to trim a CUDA one for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's arguably worth logging an issue in ROCm upstream, it's unlikely all those *gfx*
files we trim are completely unique, I'm sure there's loads of duplicate data in those files that could be shared using some common files or some other form of de-duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It leaves a bad first impression for ramalama if users have to pull huge container images just to get going, some people won't wait around for hours for a large image to be pulled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also do container images like rocm-gfx9-fedora, rocm-gfx10-fedora, etc. We may be able to detect what generation we are pre-pull
UBI has nothing lower than gfx9 @maxamillion I see you found gfx8 . If there's gfx7 or less, likely worth removing also. |
@ericcurtin nothing older than gfx8, but it's still a lot of space (granted, it was 14G before I pulled both gfx9* and gfx8* out)
|
Looks like you are almost there, if only we could rank the GPU models by units sold or something and trim based on least units sold. |
Could also be interesting to see if there is further compression we can do other than whatever podman build does by default. @rhatdan any compression techniques, etc. we could do off the top of your head? |
gfx10 seems like the first models were released in 2019 FWIW |
Maybe we will be lucky and this will be small enough to go green |
Seperate images for gfx8 gfx9 gfx10 gfx11 would be a welcome addition to be honest, it would improve UX, faster pulls. We may be able to detect which image we need to pull on a given machine by querying /proc /sys /dev etc. Pre-pull. Also any image with < 1GB VRAM has incredibly low value in enabling, often CPU inferencing is better when VRAM is this low (maybe even < 2GB, 4GB, 8GB VRAM). An 8 GB VRAM GPU is considered small these days. |
well bummer ... even splitting them all out the builds are too large 🤔 |
We could just use the exact same list of GPUs as the UBI9 images and enable additional ones on demand as people request. |
42a89f7
to
b28db8c
Compare
@ericcurtin I don't see a way to trim this down further, I'm open to suggestion though. This is some sample output from the
Based on the output, there's not really anything in there we can rip out based on specific GPU models. It's all just bundled into giant shared object files 😕 |
@ericcurtin in a shocking turn of events, if I set it to |
lol nvm .. it's smaller because cmake failed 🤦 |
I am fine with that, do you know why it is smaller? |
because cmake failed 🤦 |
No issue with rawhide, eventually we can call it Fedora 42 |
This is an isolated image, so that helps |
423541d
to
f90a99c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, hopefully we can find a way of figuring out which gfx generation we are in a follow on PR
I still need to fix one last nagging issue with the rocblas path and then I'll ask for a merge. 🙂 |
rocm-fedora-gfx10 container image is too large from my reading of the build logs |
This is fantastic. While reading through the comments, I saw that you were fighting against the size of the ROCm images because of those different The idea of separate ROCm images for different AMD GPUs does sound tempting to me. I wonder if we should split things like Finally, if you are looking for the list of GPUs enabled in Fedora's ROCm, then look at ROCM_GPUS in |
It does look like poor design, it's hard to believe each of those files are so unique, but that's an AMD problem to solve I guess... |
One can do:
... but I suppose you want to do it in Python with saner APIs. :) |
I don't know all the details. Kevin told me that doing it this way, without a deeper redesign, improves performance by a few percentage points and AMD doesn't want to let that go. |
rocminfo is open source AFAIK, so we may actually be able to figure out the piece of code where it gets that gfx data from |
This is slightly better:
... and |
Great find @debarshiray ! We also have this problem (which is accounted for in some parts of RamaLama), my machine returns this:
one of those amd gpus it not suitable for inferencing, it's the integrated graphics, it has tiny VRAM and one is the 1102, it has 8G VRAM. Just something to keep in mind to take into account |
readFromKFD() function is basically the important part of that script. |
Reading the
What does the |
there's a little decimal to hex conversion to get "02" and "0c" |
Could probably boil it down to ~10 lines of python if we need to |
Yes, it's:
... and then:
Did you figure out how to filter out your smaller GPU? |
Skimming through the AMDKFD code in the Linux kernel gave me some ideas, particularly Both |
Yes it's code elsewhere in RamaLama:
basically ignore every GPU with less than 1GB VRAM and choose the one with the most VRAM. |
Signed-off-by: Adam Miller <[email protected]>
42f52ad
to
5c16714
Compare
Signed-off-by: Adam Miller <[email protected]>
Add ramalama image built on Fedora using Fedora's rocm packages, this enables more models of embedded GPUs in the Ryzen APUs for more series of gfx than is available in the official ROCm packages from AMD that are used in the UBI based default images.