Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding optimizer overlap for FSDP #203

Closed
wants to merge 6 commits into from

Conversation

HamidShojanazeri
Copy link
Contributor

@HamidShojanazeri HamidShojanazeri commented Sep 15, 2023

What does this PR do?

This PR adds Optimizer overlap that bring addition memory savings by fusing the gradinet calculation and parameter update steps.

for 7B --> max reserved memory saving is 7%
---> allocated memory 4%
---> active memory 4%

Feature/Issue validation/testing

Please describe the tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

Logs with/out anyprecision

Logs with AdamW

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • [ X] Was this discussed/approved via a Github issue? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Thanks for contributing 🎉!

README.md Outdated Show resolved Hide resolved
print(f"setting up optimizer overlap")
optim_kwargs = {"lr": train_config.lr}
_apply_optimizer_in_backward(
optimizer_class=optim.AdamW,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the optimizer_in_backward_available flag is set and user has selected AnyPrecisionAdamW, will be good to add for that case as well, unless there is a restriction on which optimizers support this feature.

Copy link
Contributor Author

@HamidShojanazeri HamidShojanazeri Oct 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added for Anyprecision as well, as per my test it works for anyprecision as well.

Copy link
Contributor

@chauhang chauhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HamidShojanazeri Thanks for this PR. Left few comments. It will be great to capture a memory profile as well

README.md Outdated Show resolved Hide resolved
src/llama_recipes/finetuning.py Outdated Show resolved Hide resolved
src/llama_recipes/finetuning.py Outdated Show resolved Hide resolved
src/llama_recipes/finetuning.py Outdated Show resolved Hide resolved
@init27 init27 closed this Aug 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants