-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat (nn/sdpa): quantization of scaled dot-product attention #1090
Conversation
0def533
to
221c822
Compare
We should merge #1088 before this. |
f0c4354
to
6b1e51a
Compare
091d982
to
b098a09
Compare
…tention` classes
2f29423
to
f0701ba
Compare
Note, when |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love it
Reason for this PR
Make it easier for users to quantize attention layers.
Changes Made in this PR
Achieved by providing:
torch.nn.functional.scaled_dot_product_attention
functionalTesting Summary
Tests:
Basic graph replacement test(covered by LLM entry-point test)Risk Highlight
Adapted from pseudocode in PyTorch's documentation. Otherwise, this code barely touches any existing code, that shouldn't break any existing Brevitas features.
Checklist
dev
branch.