Use bf16 parameters in bf16 mixed prec #283

jph00 · 2023-11-01T05:09:28Z

What does this PR do?

bfSixteen_mixed is a poor default choice for mixed precision training, because it does not use tensor cores. Instead, it does all computation in fp32! I've tested, and on an A6000 it's 2.5x slower to train a 34B model.

Feature/Issue validation/testing

I tried running the fine tuning script with this change with both 7B and 34B models and it ran 2.5x faster each time.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Thanks for contributing 🎉!

jph00 · 2023-11-01T05:11:06Z

PR previously discussed with @lessw2020 .

HamidShojanazeri

thanks @jph00 for the PR.

lessw2020 · 2023-11-01T16:55:46Z

Core issue here is the params are being set to torch.float32 instead of the expected torch.bfloat16 with the default policy.
This was creating memory and slowness issues that Jeremy flagged.
With the current default, it is not the expected mixed precison where we do local computations in bf16 and just keep the master copies in fp32.
Moving to bf16 for params resolves this and hence his PR.

Use bf16 parameters in bf16 mixed prec

eca8410

facebook-github-bot added the cla signed label Nov 1, 2023

HamidShojanazeri approved these changes Nov 1, 2023

View reviewed changes

HamidShojanazeri merged commit acce2d8 into meta-llama:main Nov 1, 2023
3 checks passed

HamidShojanazeri mentioned this pull request Nov 1, 2023

VRAM too high when using PEFT + FSDP + BF16 #276

Closed

2 tasks

jph00 deleted the patch-1 branch November 1, 2023 19:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use bf16 parameters in bf16 mixed prec #283

Use bf16 parameters in bf16 mixed prec #283

jph00 commented Nov 1, 2023

jph00 commented Nov 1, 2023

HamidShojanazeri left a comment

lessw2020 commented Nov 1, 2023

Use bf16 parameters in bf16 mixed prec #283

Use bf16 parameters in bf16 mixed prec #283

Conversation

jph00 commented Nov 1, 2023

What does this PR do?

Feature/Issue validation/testing

Before submitting

jph00 commented Nov 1, 2023

HamidShojanazeri left a comment

Choose a reason for hiding this comment

lessw2020 commented Nov 1, 2023