You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
f"The requested device {vllm_device} is also being used for training. For higher throughput "
"and to avoid out-of-memory errors, it is recommended to use a dedicated device for vLLM. "
"If this is intentional, you may ignore this warning but should adjust "
"`vllm_gpu_memory_utilization` accordingly."
)
In the current GRPO implementation, VLLM can only run on a single GPU, which becomes a performance bottleneck. For example, in an 8-GPU setup, the remaining 7 GPUs have to wait for 1 GPU to complete inference, and it also can't accommodate larger models.
How can we enable VLLM to run on multiple GPUs? The only concern is that we need to figure out a way to update the parameters across multiple GPUs each time the model is reloaded:
trl/trl/trainer/grpo_trainer.py
Lines 439 to 461 in e5ae703
In the current GRPO implementation, VLLM can only run on a single GPU, which becomes a performance bottleneck. For example, in an 8-GPU setup, the remaining 7 GPUs have to wait for 1 GPU to complete inference, and it also can't accommodate larger models.
How can we enable VLLM to run on multiple GPUs? The only concern is that we need to figure out a way to update the parameters across multiple GPUs each time the model is reloaded:
trl/trl/trainer/grpo_trainer.py
Lines 624 to 653 in e5ae703
The text was updated successfully, but these errors were encountered: