rm_alloc returned 81: Out of memory #658

kensmith · 2025-01-30T17:00:49Z

exo seems to be OOMing despite having lots of free RAM.

% cat /proc/meminfo|head
MemTotal:       16029428 kB
MemFree:         1478608 kB
MemAvailable:   10660452 kB
Buffers:          560476 kB
Cached:         10730028 kB
SwapCached:          820 kB
Active:          5688328 kB
Inactive:        7405268 kB
Active(anon):    2852260 kB
Inactive(anon):  1600664 kB

% curl http://localhost:52415/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
     "model": "llama-3.1-8b",
     "messages": [{"role": "user", "content": "What is the meaning of exo?"}],
     "temperature": 0.7
   }'
{"detail": "Error processing prompt (see logs with DEBUG>=2): rm_alloc returned 81: Out of memory"}%

If I read the README.md correctly, I should only need 16GB of RAM to run this model so having more than 100GB free seems like I should not be OOMing.

The text was updated successfully, but these errors were encountered:

AlexCheema · 2025-01-30T19:50:28Z

What device are you running on?

kensmith · 2025-01-30T20:00:59Z

Arch Linux
AMD Ryzen 7 9700X
RTX3080TI

andrenaP · 2025-01-31T21:16:29Z

I think I have same issue.
I am trying to run a Llama 3.2 1B on Arch Linux 16 GB of ram Intel CPU. I got same error Out of memory.
BUT I was able to run it in cluster 7GB (other laptop)+4GB (GPU in docker) +15GB(main laptop RAM) with speed of 0.3 token/sec.

ejrydhfs · 2025-02-08T07:28:00Z

sounds like EXO is trying to only use VRAM and do inference on the GPU instead of hybrid inference or resorting to CPU-only inference, i ran into the same issue today

ejrydhfs · 2025-02-08T07:34:36Z

I think I have same issue. I am trying to run a Llama 3.2 1B on Arch Linux 16 GB of ram Intel CPU. I got same error Out of memory. BUT I was able to run it in cluster 7GB (other laptop)+4GB (GPU in docker) +15GB(main laptop RAM) with speed of 0.3 token/sec.

Mind saying how you were able to do it? Did you isolate the gpu from the rest of the system? Making the system seem like it had an integrated gpu only? I assume OOM may not be an issue in systems with integrated graphics/unified memory because both GPU and CPU share the same memory

andrenaP · 2025-02-08T10:36:03Z

I think I have same issue. I am trying to run a Llama 3.2 1B on Arch Linux 16 GB of ram Intel CPU. I got same error Out of memory. BUT I was able to run it in cluster 7GB (other laptop)+4GB (GPU in docker) +15GB(main laptop RAM) with speed of 0.3 token/sec.

Mind saying how you were able to do it? Did you isolate the gpu from the rest of the system? Making the system seem like it had an integrated gpu only? I assume OOM may not be an issue in systems with integrated graphics/unified memory because both GPU and CPU share the same memory

This was pretty easy. I created a docker container with exo. Because I didn't pass GPU in container in it is used CPU.
Of course I have to download the same model on both container and main system.... In future I will use docker volumes to share model across docker and main system.

ejrydhfs · 2025-02-10T19:16:31Z

I thought about implementing a docker workaround too it's reliable and effective but I believe it is not ideal since it implies that a network is also emulated in the system for communicating between the cpu and gpu and that imposes an overhead which can be significant at high speeds like 10gbps, an ideal solution I believe would be to support hybrid inference on individual nodes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rm_alloc returned 81: Out of memory #658

rm_alloc returned 81: Out of memory #658

kensmith commented Jan 30, 2025

AlexCheema commented Jan 30, 2025

kensmith commented Jan 30, 2025

andrenaP commented Jan 31, 2025

ejrydhfs commented Feb 8, 2025 •

edited

Loading

ejrydhfs commented Feb 8, 2025 •

edited

Loading

andrenaP commented Feb 8, 2025

ejrydhfs commented Feb 10, 2025 •

edited

Loading

rm_alloc returned 81: Out of memory #658

rm_alloc returned 81: Out of memory #658

Comments

kensmith commented Jan 30, 2025

AlexCheema commented Jan 30, 2025

kensmith commented Jan 30, 2025

andrenaP commented Jan 31, 2025

ejrydhfs commented Feb 8, 2025 • edited Loading

ejrydhfs commented Feb 8, 2025 • edited Loading

andrenaP commented Feb 8, 2025

ejrydhfs commented Feb 10, 2025 • edited Loading

ejrydhfs commented Feb 8, 2025 •

edited

Loading

ejrydhfs commented Feb 8, 2025 •

edited

Loading

ejrydhfs commented Feb 10, 2025 •

edited

Loading