-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Predicate cache alternative implementation #4292
base: main
Are you sure you want to change the base?
Conversation
7c80598
to
c9155d8
Compare
@@ -822,6 +824,10 @@ void Arm64Emitter::FillStaticRegs(bool FPRs, uint32_t GPRFillMask, uint32_t FPRF | |||
} | |||
} | |||
|
|||
if (FillP2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps you can drop this and change to a simple sve supported check, as is it'll not work for the FillSRA in dispatcher and a single ptrue is cheap enough not to matter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if we always fill p2
when SVE is supported this whole optimization makes no longer sense from a size pov. If we do it without actually needing to use p2
for the SVE store optimization then we are definitely pessimizing things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spills are rare enough that a ptrue certainly isn't going to change anything, the opt still works within code blocks which is where it matters as they can be much hotter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Humm, I still didn't manage to get SRA to work. The RA algo is still allocating the wrong register to the predicate register, i.e. p5. It should be p2, which is the SRA register I selected to use in DecodeSRAReg but it's not working. First time I am looking at this code so slightly confused about this.
At the same time, i am wondering if this is necessary. What's the advantage of using SRA over just hardcoding p2 usage in the instructions, like done with PRED_TMP_16B
for example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Humm, I still didn't manage to get SRA to work. The RA algo is still allocating the wrong register to the predicate register, i.e. p5. It should be p2, which is the SRA register I selected to use in DecodeSRAReg but it's not working. First time I am looking at this code so slightly confused about this.
At the same time, i am wondering if this is necessary. What's the advantage of using SRA over just hardcoding p2 usage in the instructions, like done with
PRED_TMP_16B
for example?
I don't think there are any, that would work fine too you'd just need to modify the store op to have some way to indicate that and you can drop LoadSVEOptPredicate and rely on it being set in FillSpecialRegs as PRED_TMP_16B is (you probably want to keep InitPredicate for other future users)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(can drop this as it's called in FillSpecialRegs)
319f7a7
to
c5ad6e1
Compare
Whenever the control float leaves the block, it might clobber the predicate register so we reset the cache whenever that happens. The difficulty here is that the cache is valid only during IR generation so we need to make sure we catch all the cases during this pass where the execution might leave the block. Fixes FEX-Emu#4264
c5ad6e1
to
45bb216
Compare
@@ -1631,9 +1631,13 @@ DEF_OP(StoreMemPredicate) { | |||
DEF_OP(LoadMemPredicate) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to introduce a new op here, or at least rename/fixup the description as appropriate
@@ -822,6 +824,10 @@ void Arm64Emitter::FillStaticRegs(bool FPRs, uint32_t GPRFillMask, uint32_t FPRF | |||
} | |||
} | |||
|
|||
if (FillP2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(can drop this as it's called in FillSpecialRegs)
|
||
if (FillP2) { | ||
ptrue(ARMEmitter::SubRegSize::i16Bit, PRED_X87_SVEOPT, ARMEmitter::PredicatePattern::SVE_VL5); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can just move this into the above if and drop the FillP2 check
Alternative implementation to #4274
In draft mode as it's incomplete and broken.