Issue with Numerical Overflow during Half-Precision Training #15

xwqtju · 2025-01-16T06:12:26Z

Dear Author,

I hope this message finds you well.

I am currently working with the network you provided, and I have encountered an issue that I need assistance with. Specifically, when training the model using half-precision (FP16), I consistently experience numerical overflow errors. However, when using full precision (FP32), the training proceeds without any issues. I have tried adjusting various settings, but the problem persists.

Could you kindly offer any suggestions or solutions to address this issue?

Thank you very much for your valuable work, and I look forward to your response.

Best regards,
Wenqiang

pimdh · 2025-01-16T07:18:47Z

Hi,
We also didn't manage to train with FP16, due to overflows. Please try bfloat16 instead, with which we had more luck. Unfortunately, it's only available on relatively recent NVIDIA GPUs.
Pim

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with Numerical Overflow during Half-Precision Training #15

Issue with Numerical Overflow during Half-Precision Training #15

xwqtju commented Jan 16, 2025

pimdh commented Jan 16, 2025

Issue with Numerical Overflow during Half-Precision Training #15

Issue with Numerical Overflow during Half-Precision Training #15

Comments

xwqtju commented Jan 16, 2025

pimdh commented Jan 16, 2025