Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Add bf16 adamw #843

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

zhtmike
Copy link
Collaborator

@zhtmike zhtmike commented Feb 21, 2025

This is the optimizer used in our movie-gen & inner cogvideox project, extracted as an independent PR. It solves the convergence issue in our movie-gen model without using Mindspore AMP interface (it just takes too much memory. :( ), as @hadipash mentioned, this optimizer is very useful.

Reference:

DeepMind: Scaling Language Models: Methods, Analysis & Insights from Training Gopher section C.2

We found it best to maintain float32 parameters purely for the optimiser update. One can
partition the set of float32 parameters for optimisation updates alone along with the optimiser
state as in Rajbhandari et al. (2020). The float32 parameters are used for the update and
again cast to bfloat16 for the forward pass. This matches performance of full float32
training, improves the speed, and has only a slightly increased memory footprint compared to
bfloat16 training.

What does this PR do?

Fixes # (issue)

Adds # (feature)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline?
  • Did you make sure to update the documentation with your changes? E.g. record bug fixes or new features in What's New. Here are the
    documentation guidelines
  • Did you build and run the code without any errors?
  • Did you report the running environment (NPU type/MS version) and performance in the doc? (better record it for data loading, model inference, or training tasks)
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@xxx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant