Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with Numerical Overflow during Half-Precision Training #15

Open
xwqtju opened this issue Jan 16, 2025 · 1 comment
Open

Issue with Numerical Overflow during Half-Precision Training #15

xwqtju opened this issue Jan 16, 2025 · 1 comment

Comments

@xwqtju
Copy link

xwqtju commented Jan 16, 2025

Dear Author,

I hope this message finds you well.

I am currently working with the network you provided, and I have encountered an issue that I need assistance with. Specifically, when training the model using half-precision (FP16), I consistently experience numerical overflow errors. However, when using full precision (FP32), the training proceeds without any issues. I have tried adjusting various settings, but the problem persists.

Could you kindly offer any suggestions or solutions to address this issue?

Thank you very much for your valuable work, and I look forward to your response.

Best regards,
Wenqiang

@pimdh
Copy link

pimdh commented Jan 16, 2025

Hi,
We also didn't manage to train with FP16, due to overflows. Please try bfloat16 instead, with which we had more luck. Unfortunately, it's only available on relatively recent NVIDIA GPUs.
Pim

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants