-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat (mx): unpadding during dequantization #1134
Conversation
@@ -28,6 +28,7 @@ def apply_input_view(self, x): | |||
return x.flatten(start_dim, start_dim + 1) | |||
|
|||
def create_quant_tensor(self, qt_args: Tuple[Any]) -> GroupwiseFloatQuantTensor: | |||
shape = self.tracked_parameter_list[0].shape |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't support weight quant sharing for groupwise anyway, so this is safe, but it is ugly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check guards for the optional argument x
. I think this can crash under certain circumstances. If there are preconditions that means that this can never occur, maybe add a comment about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved once the function signature of create_quant_tensor
is updated.
4422543
to
1fdd111
Compare
Reason for this PR
Groupwise quantization requires padding when the input channel shape is not divisible by groupsize.
Padding works well until it doesn't, and there are important edge cases that were not covered by the previous implementation.
(e.g., weight only quantization where padding was required. Until now, we also had to force activation quantization because otherwise we had shape mismatch).
Changes Made in this PR
With the current implementation, we un-pad when dequantizing, taking care of all the edge cases
Few todos:
Testing Summary
Risk Highlight
Checklist
dev
branch.