what is the minimum requirements to run in local? #86

Amrosx · 2024-10-01T07:32:16Z

  torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacity of 7.69 GiB of which 13.94 MiB is free. Including non-PyTorch memory, this process has 7.64 GiB memory in use. Of the allocated memory 7.34 GiB is allocated by PyTorch, and 121.77 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. ```

The text was updated successfully, but these errors were encountered:

Amrosx · 2024-10-02T13:14:29Z

I have increased the GPU to 24G memory but still out of memory

CUDA out of memory. Tried to allocate 112.00 MiB. GPU 0 has a total capacity of 23.58 GiB of which 3.38 MiB is free. Process 3060 has 16.51 GiB memory in use. Including non-PyTorch memory, this process has 7.02 GiB memory in use. Of the allocated memory 6.71 GiB is allocated by PyTorch, and 77.09 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

ashwinb · 2024-10-02T13:32:35Z

@Amrosx What kind of GPU do you have? What model are you trying to use?

Amrosx · 2024-10-02T13:38:12Z

@ashwinb I am using NVIDIA RTX A5000 24G
the models
Llama3.1-8b-instruct
Llama-Guard-3-8B
same as in the README

ashwinb · 2024-10-02T15:46:54Z

What command are you using to start? I think you need to disable safety (i.e., avoid loading the Llama-Guard-3-8B model or maybe use the much lighter weight Llama-Guard-3-1B we just released) to be able to fit everything onto your card.

Amrosx · 2024-10-02T16:22:41Z

I have followed the instruction of llama stack build , configuration and run 8b-instruct
I run with safety and without but error still
python app/main.py --disabled-safety but no change
python app/main.py
when I sent Hello message
I got this error CUDA out of memory in server side

yanxi0830 · 2024-10-15T21:28:44Z

You may start with Llama3.2-1B-Instruct model for inference. E.g.

inference: 
  - provider_id: meta-reference
    provider_type: meta-reference
    config:
      model: Llama3.2-1B-Instruct
      quantization: null
      torch_seed: null
      max_seq_len: 4096
      max_batch_size: 1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

what is the minimum requirements to run in local? #86

what is the minimum requirements to run in local? #86

Amrosx commented Oct 1, 2024

Amrosx commented Oct 2, 2024

ashwinb commented Oct 2, 2024

Amrosx commented Oct 2, 2024

ashwinb commented Oct 2, 2024

Amrosx commented Oct 2, 2024 •

edited

Loading

yanxi0830 commented Oct 15, 2024

what is the minimum requirements to run in local? #86

what is the minimum requirements to run in local? #86

Comments

Amrosx commented Oct 1, 2024

Amrosx commented Oct 2, 2024

ashwinb commented Oct 2, 2024

Amrosx commented Oct 2, 2024

ashwinb commented Oct 2, 2024

Amrosx commented Oct 2, 2024 • edited Loading

yanxi0830 commented Oct 15, 2024

Amrosx commented Oct 2, 2024 •

edited

Loading