-
Notifications
You must be signed in to change notification settings - Fork 615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
what is the minimum requirements to run in local? #86
Comments
I have increased the GPU to 24G memory but still out of memory CUDA out of memory. Tried to allocate 112.00 MiB. GPU 0 has a total capacity of 23.58 GiB of which 3.38 MiB is free. Process 3060 has 16.51 GiB memory in use. Including non-PyTorch memory, this process has 7.02 GiB memory in use. Of the allocated memory 6.71 GiB is allocated by PyTorch, and 77.09 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) |
@Amrosx What kind of GPU do you have? What model are you trying to use? |
@ashwinb I am using NVIDIA RTX A5000 24G |
What command are you using to start? I think you need to disable safety (i.e., avoid loading the Llama-Guard-3-8B model or maybe use the much lighter weight Llama-Guard-3-1B we just released) to be able to fit everything onto your card. |
I have followed the instruction of llama stack build , configuration and run 8b-instruct |
You may start with inference:
- provider_id: meta-reference
provider_type: meta-reference
config:
model: Llama3.2-1B-Instruct
quantization: null
torch_seed: null
max_seq_len: 4096
max_batch_size: 1 |
The text was updated successfully, but these errors were encountered: