-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revisit how __truediv__
works for QuantTensor
#740
Comments
@preusser, I would appreciate feedback on my proposal here. |
__truediv__
works for TensorQuant
__truediv__
works for QuantTensor
In most of the cases, division is a lossy operation. So, I assume the goal would be to preserve the accuracy that can reasonably be expected in the operands?
The scaling performed for the value computation ensures that even the biggest |
Thanks @preusser, I will adopt your suggestions into the proposal. |
__truediv__
inQuantTensor
currently does the inverse operation of multiplication, W.R.T. how the output bitwidths, scales are calculated based on the inputs.For
d = a / b
, we currently do the following:This makes sense, when the numerator is the result of the previous multiplication, but perhaps doesn't generalise to a standalone division operation. This also diverges a little bit from what "traditional"* fixed-point arithmetic does. My suggestion is to generalise traditional fixed point arithmetic rules for floating point scales, and to be able to represent the extremes of the input and output range.
A convenient way to do this, would be do decompose division
(a / b)
intoa * (1 / b)
, and have a simple rule for calculating to output bitwidth of1 / b
, then apply the regular rules to multiplication that we normally do. My suggestion is then as follows:Adding the multiply, leads to:
The resulting value would then be:
I believe this would match traditional fixed point arithmetic rules in the power-of-two case and do something not completely unreasonable in the floating point scaling case. However, the floating point scaling case should be studied further to understand the ramifications of making such a choice. I expect that the decomposition of division into inversion and multiplication is not unreasonable.
*traditional = power-of-two scaling
The text was updated successfully, but these errors were encountered: