Multi node multi gpu distributed load #6927

rastinrastinii · 2025-01-06T11:13:08Z

Hi
it seems we can load model in on one node then distribute model between node and train or inference but,
imagine we have 2 node each node 2 gpu with 24GB vram each gpu.
wanna loading model like gemma2 27B. one node can not load it. it need to distribute load between node from start till end without needing to load completely in one node.(imagine none of nodes can not offload for loading complete model on one node, like llama 3.1 405B).

is there a way to do this?

tjruwase · 2025-01-06T16:43:16Z

It sounds like you need zero stage 3 model initialization. The following links could be useful

rastinrastinii · 2025-01-07T16:28:14Z

Thanks for your help

rastinrastinii added the enhancement New feature or request label Jan 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi node multi gpu distributed load #6927

Multi node multi gpu distributed load #6927

rastinrastinii commented Jan 6, 2025

tjruwase commented Jan 6, 2025

rastinrastinii commented Jan 7, 2025

Multi node multi gpu distributed load #6927

Multi node multi gpu distributed load #6927

Comments

rastinrastinii commented Jan 6, 2025

tjruwase commented Jan 6, 2025

rastinrastinii commented Jan 7, 2025