Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is current megablocks compatible with distributed optimizer in Megatron-LM? #160

Open
Spico197 opened this issue Nov 11, 2024 · 1 comment

Comments

@Spico197
Copy link

Hi there, thanks for the amazing work! I found expert parallel is not compatible with the distributed optimizer in the fork version of Megatron-LM here:

https://github.com/stanford-futuredata/Megatron-LM/blob/85f95aef3b648075fe6f291c86714fdcbd9cd1f5/megatron/arguments.py#L352-L356

But there's no such validation in the open PR to Megatron-LM: NVIDIA/Megatron-LM#288

Does that mean the assertion is redundant and the current version of megablocks is compatible with the distributed optimizer under expert parallelism?

Thanks very much.

@Spico197
Copy link
Author

I setup an experiment with 64 experts split across 2 devices with expert parallel. Both MegaBlocks and distributed optimizer are enabled. However, I found the saved experts across devices are the same (32 experts in rank0 are in the same weights as the other 32 experts in rank1).

But when the distributed optimizer is disabled, there seems to be no problem. So I'm wondering if there is still a potential incompatibility with the latest Megatron-LM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant