-
Notifications
You must be signed in to change notification settings - Fork 27.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix hardcoded float
dtypes in DeBERTa model, which caused multiple RuntimeErrors in bfloat16
#35336
base: main
Are you sure you want to change the base?
Conversation
Hi @bauwenst, I think bias terms for several models, and sensitive computations like Attention and RoPE, are intended to be kept in I agree that there's a bug here: The hidden states are forced to |
@Rocketknight1 Isn't the point of training in
I doubt that this is intended in DeBERTa. When you let the bias follow the |
Hi @bauwenst, float32 biases are common for two reasons:
If you use quantization libraries like |
@Rocketknight1 Okay, that makes some sense; do note that DeBERTa does not seem to be one of these models that upcast to do attention with higher precision. I have two questions in that case:
|
Hi @bauwenst, regarding 1, I think you can just add the bias term and then cast the As for question 2, I'm afraid I don't have an answer! In general, models are added to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey! Thanks for opening the PR.
The main reason we usually don't patch this is because this was hardcoded by the original authors, and thus we would be changing the results for people that use to rely on the wrong behavior.
Fixing the mask filling does make sense tho! We can do it 🤗
Hi, May I ask any update here? |
cc @bauwenst should we take this over? |
What does this PR do?
Fix #35332 by removing any hardcoded
float
dtypes.Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@ArthurZucker