Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Predicate cache alternative implementation #4292

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

pmatos
Copy link
Collaborator

@pmatos pmatos commented Jan 22, 2025

Alternative implementation to #4274

In draft mode as it's incomplete and broken.

@pmatos pmatos force-pushed the EnsurePredCacheReset2 branch 3 times, most recently from 7c80598 to c9155d8 Compare January 22, 2025 09:28
@@ -822,6 +824,10 @@ void Arm64Emitter::FillStaticRegs(bool FPRs, uint32_t GPRFillMask, uint32_t FPRF
}
}

if (FillP2) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps you can drop this and change to a simple sve supported check, as is it'll not work for the FillSRA in dispatcher and a single ptrue is cheap enough not to matter

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if we always fill p2 when SVE is supported this whole optimization makes no longer sense from a size pov. If we do it without actually needing to use p2 for the SVE store optimization then we are definitely pessimizing things.

Copy link
Collaborator

@bylaws bylaws Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spills are rare enough that a ptrue certainly isn't going to change anything, the opt still works within code blocks which is where it matters as they can be much hotter

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Humm, I still didn't manage to get SRA to work. The RA algo is still allocating the wrong register to the predicate register, i.e. p5. It should be p2, which is the SRA register I selected to use in DecodeSRAReg but it's not working. First time I am looking at this code so slightly confused about this.

At the same time, i am wondering if this is necessary. What's the advantage of using SRA over just hardcoding p2 usage in the instructions, like done with PRED_TMP_16B for example?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Humm, I still didn't manage to get SRA to work. The RA algo is still allocating the wrong register to the predicate register, i.e. p5. It should be p2, which is the SRA register I selected to use in DecodeSRAReg but it's not working. First time I am looking at this code so slightly confused about this.

At the same time, i am wondering if this is necessary. What's the advantage of using SRA over just hardcoding p2 usage in the instructions, like done with PRED_TMP_16B for example?

I don't think there are any, that would work fine too you'd just need to modify the store op to have some way to indicate that and you can drop LoadSVEOptPredicate and rely on it being set in FillSpecialRegs as PRED_TMP_16B is (you probably want to keep InitPredicate for other future users)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(can drop this as it's called in FillSpecialRegs)

@pmatos pmatos force-pushed the EnsurePredCacheReset2 branch 4 times, most recently from 319f7a7 to c5ad6e1 Compare January 23, 2025 15:43
@pmatos pmatos marked this pull request as ready for review January 23, 2025 15:43
Whenever the control float leaves the block, it might clobber the
predicate register so we reset the cache whenever that happens.

The difficulty here is that the cache is valid only during IR generation
so we need to make sure we catch all the cases during this pass where
the execution might leave the block.

Fixes FEX-Emu#4264
@@ -1631,9 +1631,13 @@ DEF_OP(StoreMemPredicate) {
DEF_OP(LoadMemPredicate) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to introduce a new op here, or at least rename/fixup the description as appropriate

@@ -822,6 +824,10 @@ void Arm64Emitter::FillStaticRegs(bool FPRs, uint32_t GPRFillMask, uint32_t FPRF
}
}

if (FillP2) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(can drop this as it's called in FillSpecialRegs)


if (FillP2) {
ptrue(ARMEmitter::SubRegSize::i16Bit, PRED_X87_SVEOPT, ARMEmitter::PredicatePattern::SVE_VL5);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just move this into the above if and drop the FillP2 check

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants