forked from Xilinx/finn
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce support for generic elementwise binary operations #10
Draft
iksnagreb
wants to merge
30
commits into
dev
Choose a base branch
from
elementwise-binary
base: dev
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This includes a set of HWCustomOp and HLSBackend operator templates which can be specialized in just a few lines of code to implement arbitrary elementwise binary operations, like Add, Mul, Sub, Div, And, Equal, etc., supporting multidirectional broadcasting. Concrete implementations for most of these operators according to standard ONNX is already sketched out. Still missing are specializations for accumulator and weight bit-width minimization and some tricky to implement operators. Also still missing is floating-point support due to HLS-backend limitations, though these *seem* to be just minor defects regarding "flatten" and "Slice". Adds unit tests in Python, C++ and RTL simulation for these new operators, though these are probably not exhaustive enough to validate all edge cases. Proposes a new scheme for registering and importing custom operators into their corresponding module namespace, i.e., the 'custom_op' dictionary used to lookup operators by ONNX domain.
Folding quantized initializers into add-like nodes did not repsect the order of inputs to the add node correctly. This is fixed by testing for one of the two possible orders and selecting the following indices accordingly. Shape inference following the transformation is fixed by deleting the annotations instead of propagating them incorrectly. Deleting the shape annotations should not hurt, as these are redone by running shape inference after each transformation anyways.
This probably is still rather sketchy, but at least it tries to check the data layout annotation. For now seems to be enough for getting the thresholds of multi-head attention right, IF qonnx properly annotates the 3D layouts.
Add is commutative and thus the export does not always generate the initializer as the second input. However, this was always assumed by this transformation, failing via assertion if the inputs were simply ordered differently. The transformation now handles both of the two possible input orderings.
Note: This applies to the "container" type, not the simulated quantization type. This is to prevent accidental promotion to float64.
Up until now, this was not a problem, as QONNX and FINN assumed all tensors to be either broadcasted offline, or, if not, be "trivially" boradcastable, like scalars or effectively scalar tensors. With the introduction of proper multidirectional broadcasting for elementwise binary operations, this might not be the case anymore and we need to explicitly reject these from being absorbed into multi-thresholds, if broadcasting is not possible (otherwise, without testing, this transformation just fails with some numpy exception).
Shapes propagating backwards in graph transformations can break subsequent shape inference. In particular, this is the case for operators involving broadcasting semantics, where the output shape cannot be fully reflected in the input shapes, i.e., even for elementwise operations, the input shape might not be identical to the output shape. This is fixed be deleting problematic shape annotations to be re-done immediately.
The new test case tests export, streamlining, conversion to hardware layers and subsequent Python, C++ and RTL simulation of QuantEltwiseAdd from Brevitas, serving as a representative example of an elementwise binary operation.
Shape propagation when reordering around elementwise addition did not behave as expected when any of the tensors is broadcast by one of the reordered operations. This is fixed by deleting and re-doing the shape annotations for the connecting tensors of the reordered pattern.
Without MoveLinearPastEltwiseAdd the two input streams variant of the integration test did not actually convert the elementiwse addition to a hardware operator, effectively "testing" the vanilla ONNX version of the operator. With this transformation and AbsorbSignBiasIntoMultiThreshold to get the signs right, the hardware operator is tested as intended now.
This is done mostly according to the Vitis High-Level Synthesis User Guide (UG1399), see the library reference on arbitrary precision integer types. The new transformations are added to all relevant test cases and some data type need to be adjusted to make the numpy references behave more robust.
This depends on adding float support support to Slice in finn-hlslib.
Join-node Mul operations have no intitializer (parameters) and thus there is nothing to factor out.
This is probably just a workaround and proper datatype inference should be implemented later. For now it seems more safe to implicitly treat the resulting parameter tensor as floating-point than assuming a wrong datatype. In most cases the resulting Add operation will later be absorbed and rounded into some thresholds anyway.
…-part-map Add V80 to Alveo part_map
9 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See Xilinx#1040 for details and discussion