-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cranelift/s390x: do not use one-way conditional branches. #10087
Cranelift/s390x: do not use one-way conditional branches. #10087
Conversation
This is a followup to bytecodealliance#10086, this time removing the one-armed branch variant for s390x. This branch was only used as the default-target branch in the `br_table` lowering. This PR incorporates the branch into the `JTSequence` pseudo-instruction. Some care is needed to keep the `ProducesBool` abstraction; it is unwrapped into its `ProducesFlags` and the `JTSequence` becomes a `ConsumesFlags`, so the compare for the jump-table bound (for default target) is not part of the pseudoinst. (This is OK because regalloc-inserted moves never alter flags, by explicit contract; the same reason allows cmp/branch terminators.) Along the way I noticed that the comments on `JTSequence` claimed that `targets` included the default, but this is (no longer?) the case, as the targets are unwrapped by `jump_table_targets` which peels off the first (default) separately. Aside from comments, this only affected pretty-printing; codegen was correct. With this, we have no more one-armed branches; hence, this fixes bytecodealliance#9980.
6225ef6
to
c214eea
Compare
cc @uweigand if you are available to review? Otherwise we're probably OK with the usual reviewer rotation... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks generally good to me, but see inline for a couple of comments.
As a longer-term improvement, I'm wondering if the whole default case handling couldn't just be done by common code using regular basic blocks and branch instructions, enabling all the normal optimization machinery? The back-end would then only have to do the actual tablejump (simply assuming that no overflow can happen).
; clgfi %r3, 3 | ||
; jghe label4 | ||
; sllg %r3, %r3, 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The generated code is slightly worse now as the shifting is done before the branch. This probably doesn't have a very significant impact on the pipeline, but it does increase register pressure. One alternative would be to compare the shifted value instead (assuming there are no overflow considerations).
But I guess this isn't serious enough that it should prevent merging this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that's true -- I did the code motion without looking at the instruction semantics in detail to optimize. We can optimize in a followup if needed I suppose.
9a932fe
to
bf8a9b6
Compare
Could I bother someone for a rubber-stamp here (@uweigand's not an official approver) -- @alexcrichton or @fitzgen maybe? (Or @abrown as original reviewer?) Thanks! |
This is a followup to #10086, this time removing the one-armed branch variant for s390x. This branch was only used as the default-target branch in the
br_table
lowering. This PR incorporates the branch into theJTSequence
pseudo-instruction. Some care is needed to keep theProducesBool
abstraction; it is unwrapped into itsProducesFlags
and theJTSequence
becomes aConsumesFlags
, so the compare for the jump-table bound (for default target) is not part of the pseudoinst. (This is OK because regalloc-inserted moves never alter flags, by explicit contract; the same reason allows cmp/branch terminators.)Along the way I noticed that the comments on
JTSequence
claimed thattargets
included the default, but this is (no longer?) the case, as the targets are unwrapped byjump_table_targets
which peels off the first (default) separately. Aside from comments, this only affected pretty-printing; codegen was correct.With this, we have no more one-armed branches; hence, this fixes #9980.