You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, thanks for your great work! I have tried gradio_demo and it is perfect.
When I try real_time_interactive_demo, it fails. It seems that BOTH models will be loaded to GPU 0 and 1(tensor_parallel=2), even I specified different GPU( 0,1 for first model and 2,3 for second model ). which causes an OOM error.
It seems an error duo to vLLM, but I tried to fix it but no gains. Have you ever met this problem?
Looking forward to your advice.
The text was updated successfully, but these errors were encountered:
Hi, thanks for your great work! I have tried gradio_demo and it is perfect.
When I try real_time_interactive_demo, it fails. It seems that BOTH models will be loaded to GPU 0 and 1(tensor_parallel=2), even I specified different GPU( 0,1 for first model and 2,3 for second model ). which causes an OOM error.
It seems an error duo to vLLM, but I tried to fix it but no gains. Have you ever met this problem?
Looking forward to your advice.
The text was updated successfully, but these errors were encountered: