Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

what is the minimum requirements to run in local? #86

Open
Amrosx opened this issue Oct 1, 2024 · 6 comments
Open

what is the minimum requirements to run in local? #86

Amrosx opened this issue Oct 1, 2024 · 6 comments

Comments

@Amrosx
Copy link

Amrosx commented Oct 1, 2024

  torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacity of 7.69 GiB of which 13.94 MiB is free. Including non-PyTorch memory, this process has 7.64 GiB memory in use. Of the allocated memory 7.34 GiB is allocated by PyTorch, and 121.77 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. ```
@Amrosx
Copy link
Author

Amrosx commented Oct 2, 2024

I have increased the GPU to 24G memory but still out of memory

CUDA out of memory. Tried to allocate 112.00 MiB. GPU 0 has a total capacity of 23.58 GiB of which 3.38 MiB is free. Process 3060 has 16.51 GiB memory in use. Including non-PyTorch memory, this process has 7.02 GiB memory in use. Of the allocated memory 6.71 GiB is allocated by PyTorch, and 77.09 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

@ashwinb
Copy link
Contributor

ashwinb commented Oct 2, 2024

@Amrosx What kind of GPU do you have? What model are you trying to use?

@Amrosx
Copy link
Author

Amrosx commented Oct 2, 2024

@ashwinb I am using NVIDIA RTX A5000 24G
the models
Llama3.1-8b-instruct
Llama-Guard-3-8B
same as in the README

@ashwinb
Copy link
Contributor

ashwinb commented Oct 2, 2024

What command are you using to start? I think you need to disable safety (i.e., avoid loading the Llama-Guard-3-8B model or maybe use the much lighter weight Llama-Guard-3-1B we just released) to be able to fit everything onto your card.

@Amrosx
Copy link
Author

Amrosx commented Oct 2, 2024

I have followed the instruction of llama stack build , configuration and run 8b-instruct
I run with safety and without but error still
python app/main.py --disabled-safety but no change
python app/main.py
when I sent Hello message
I got this error CUDA out of memory in server side

@yanxi0830
Copy link
Contributor

You may start with Llama3.2-1B-Instruct model for inference. E.g.

inference: 
  - provider_id: meta-reference
    provider_type: meta-reference
    config:
      model: Llama3.2-1B-Instruct
      quantization: null
      torch_seed: null
      max_seq_len: 4096
      max_batch_size: 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants