-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding optimizer overlap for FSDP #203
Conversation
src/llama_recipes/finetuning.py
Outdated
print(f"setting up optimizer overlap") | ||
optim_kwargs = {"lr": train_config.lr} | ||
_apply_optimizer_in_backward( | ||
optimizer_class=optim.AdamW, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the optimizer_in_backward_available flag is set and user has selected AnyPrecisionAdamW, will be good to add for that case as well, unless there is a restriction on which optimizers support this feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added for Anyprecision as well, as per my test it works for anyprecision as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HamidShojanazeri Thanks for this PR. Left few comments. It will be great to capture a memory profile as well
What does this PR do?
This PR adds Optimizer overlap that bring addition memory savings by fusing the gradinet calculation and parameter update steps.
for 7B --> max reserved memory saving is 7%
---> allocated memory 4%
---> active memory 4%
Feature/Issue validation/testing
Please describe the tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.
Test A without optimizer_overlap
https://gist.github.com/HamidShojanazeri/08ee3d23bdb0fa60466071dee1efda1f
Test B with optimizer_overlap
https://gist.github.com/HamidShojanazeri/3d1147012e9db130dd7cebf75d3caa64
Logs with/out anyprecision
Logs with AdamW
Before submitting
Pull Request section?
to it if that's the case.
Thanks for contributing 🎉!