-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIEX] Delay metalizing of multi-slot until iterative scheduling is converged #182
base: aie-public
Are you sure you want to change the base?
Conversation
@martien-de-jong , @andcarminati. |
Hi @krishnamtibrewala, nice work! Do you have some results for the PixelShuffle*/PixelUnshuffle* benchmarks? If I remember correctly, we have some suboptimal mov desc assignments (movx should be selected instead of mova to not shift loads ups). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we have a test example where we see improvement?
46df116
to
54977ed
Compare
Do you have some results for the PixelShuffle*/PixelUnshuffle* benchmarks? If I remember correctly, we have some suboptimal mov desc assignments (movx should be selected instead of mova to not shift loads ups).
Could we have a test example where we see improvement?
Based on discussion with @gbossu we were expecting some impact but with current implementation QoR have no change. |
54977ed
to
12024f9
Compare
$wh10 = VMOV_mv_w $wl0 | ||
JNZ $r3, %bb.1 | ||
DelayedSchedBarrier | ||
bb.2: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a general advice, I think we should have a class to manage the description handling. We can encapsulate and use it in HazardRecognizer and InterBlockScheduling. |
98514f1
to
4033a60
Compare
4033a60
to
aaf0aa8
Compare
aaf0aa8
to
904d379
Compare
904d379
to
49f3ad1
Compare
02d0b50
to
8bf356f
Compare
7a241b2
to
fdab2af
Compare
Could you summarize what this PR does? Maybe in the PR description. I'm particularly interested in:
|
fdab2af
to
74da8ee
Compare
74da8ee
to
f7ef02c
Compare
liveins: $r1, $r2 | ||
successors: %bb.3 | ||
$r2 = OR $r2, $r1 | ||
bb.3: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That test is unfortunately very hard to read. Could you think of something smaller that shows a diff? I'd also suggest avoiding two labels like on/off and stick to unique CHECK lines unless the test is very concise.
QoR Results are as followed, there are few regression reset all benchmarks have same QoR Select_aie2_bf16 409 440 REGR(+7.58%) |
I'm quite surprised about the results. Did you have time to investigate the regressions? |
f7ef02c
to
506ed74
Compare
With the latest changes + 2nd Commit ([AIE2] Update VLD Multi-Slot Itinerary) of the PR there are no regressions. |
506ed74
to
e74b8f3
Compare
@@ -73,7 +73,7 @@ let isMoveImm = 1, isReMaterializable = 1, isAsCheapAsAMove = 1, Itinerary = II_ | |||
|
|||
// Pseudo VLD | |||
let hasSideEffects = false, mayLoad = true, mayStore = false in { | |||
let Itinerary = II_VLDA_W in { | |||
let Itinerary = II_VLDB in { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give more details about this change? Why does this impact the postmisched? It should not be impacted by the resources of the multi-slot itinerary, because when checking or adding to the scoreboard, the final itinerary should be used.
This PR allows Multi-Slot Instr. to be used during iterative scheduling of loop.
Before this PR we were materializing Multi-Slot Instr. to selected OpCode/Slot after first iteration of iterative scheduling.
Now we wait until PostRA scheduling is converged to an acceptable schedule and we have decided to commit the schedule and move to next MBB.
Note : Given the information of what Alternate opcode/desc was selected is stored in Hazard Recognizer for a region.
And by the time we come to the end of MBB ( i.e leaveMBB() ) we do not have access to the instance of those Hazard Recognizer, therefore we need to make the Alternate opcode/desc part of the
BlockState