Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIEX] Delay metalizing of multi-slot until iterative scheduling is converged #182

Open
wants to merge 2 commits into
base: aie-public
Choose a base branch
from

Conversation

krishnamtibrewala
Copy link
Collaborator

@krishnamtibrewala krishnamtibrewala commented Sep 6, 2024

This PR allows Multi-Slot Instr. to be used during iterative scheduling of loop.
Before this PR we were materializing Multi-Slot Instr. to selected OpCode/Slot after first iteration of iterative scheduling.
Now we wait until PostRA scheduling is converged to an acceptable schedule and we have decided to commit the schedule and move to next MBB.

  • When is the materialization triggered now? : When we leave a MBB.
  • Does this change depending on the region type? : The changes are agnostic to region type.
  • Could you think of the case where the materialization is not triggered before moving on to another MBB? : None that I can think of

Note : Given the information of what Alternate opcode/desc was selected is stored in Hazard Recognizer for a region.
And by the time we come to the end of MBB ( i.e leaveMBB() ) we do not have access to the instance of those Hazard Recognizer, therefore we need to make the Alternate opcode/desc part of the BlockState

@krishnamtibrewala
Copy link
Collaborator Author

@martien-de-jong , @andcarminati.
Kindly review and provide an early feedback toward the approach

@andcarminati
Copy link
Collaborator

Hi @krishnamtibrewala, nice work! Do you have some results for the PixelShuffle*/PixelUnshuffle* benchmarks? If I remember correctly, we have some suboptimal mov desc assignments (movx should be selected instead of mova to not shift loads ups).

Copy link
Collaborator

@martien-de-jong martien-de-jong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have a test example where we see improvement?

llvm/lib/Target/AIE/AIEInterBlockScheduling.h Outdated Show resolved Hide resolved
llvm/lib/Target/AIE/AIEInterBlockScheduling.h Outdated Show resolved Hide resolved
@krishnamtibrewala
Copy link
Collaborator Author

Do you have some results for the PixelShuffle*/PixelUnshuffle* benchmarks? If I remember correctly, we have some suboptimal mov desc assignments (movx should be selected instead of mova to not shift loads ups).

@andcarminati I tried but I did not see any change, still investigating why.

Could we have a test example where we see improvement?

@martien-de-jong still figuring out.

Based on discussion with @gbossu we were expecting some impact but with current implementation QoR have no change.

$wh10 = VMOV_mv_w $wl0
JNZ $r3, %bb.1
DelayedSchedBarrier
bb.2:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attached is a diff, I am still trying to figure out how things are interacting.
Also will try to come up with a smaller test case.

image

@andcarminati
Copy link
Collaborator

As a general advice, I think we should have a class to manage the description handling. We can encapsulate and use it in HazardRecognizer and InterBlockScheduling.

@krishnamtibrewala krishnamtibrewala changed the title [AIEX] Re-assign multi-slot instructions during iterative scheduling [AIEX] Delay metalizing of multi-slot until iterative scheduling is converged Oct 14, 2024
@krishnamtibrewala krishnamtibrewala force-pushed the aie-loop-multiOpcode branch 2 times, most recently from 02d0b50 to 8bf356f Compare October 14, 2024 11:24
@krishnamtibrewala krishnamtibrewala force-pushed the aie-loop-multiOpcode branch 2 times, most recently from 7a241b2 to fdab2af Compare October 22, 2024 18:32
@gbossu
Copy link
Collaborator

gbossu commented Nov 4, 2024

Could you summarize what this PR does? Maybe in the PR description. I'm particularly interested in:

  • When is the materialization triggered now?
  • Does this change depending on the region type?
  • Could you think of the case where the materialization is not triggered before moving on to another MBB?

liveins: $r1, $r2
successors: %bb.3
$r2 = OR $r2, $r1
bb.3:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That test is unfortunately very hard to read. Could you think of something smaller that shows a diff? I'd also suggest avoiding two labels like on/off and stick to unique CHECK lines unless the test is very concise.

@krishnamtibrewala
Copy link
Collaborator Author

QoR Results are as followed, there are few regression reset all benchmarks have same QoR

Select_aie2_bf16 409 440 REGR(+7.58%)
BitwiseXor_aie2_int8 731 776 REGR(+6.16%)
BilinearInterpolation_2 996 1018 REGR(+2.21%)
BilinearInterpolation_3 996 1018 REGR(+2.21%)
BilinearInterpolation_4 996 1018 REGR(+2.21%)
BilinearInterpolation_0 780 794 REGR(+1.79%)
Conv2D_bf16_2 19089 19281 REGR(+1.01%)
Conv2D_bf16_5 19089 19281 REGR(+1.01%)
Conv2D_bf16_8 20607 20799 REGR(+0.93%)
BilinearInterpolation_1 474 478 REGR(+0.84%)
Conv2D_bf16_59 6253 6301 REGR(+0.77%)

@gbossu
Copy link
Collaborator

gbossu commented Nov 25, 2024

QoR Results are as followed, there are few regression reset all benchmarks have same QoR

Select_aie2_bf16 409 440 REGR(+7.58%) BitwiseXor_aie2_int8 731 776 REGR(+6.16%) BilinearInterpolation_2 996 1018 REGR(+2.21%) BilinearInterpolation_3 996 1018 REGR(+2.21%) BilinearInterpolation_4 996 1018 REGR(+2.21%) BilinearInterpolation_0 780 794 REGR(+1.79%) Conv2D_bf16_2 19089 19281 REGR(+1.01%) Conv2D_bf16_5 19089 19281 REGR(+1.01%) Conv2D_bf16_8 20607 20799 REGR(+0.93%) BilinearInterpolation_1 474 478 REGR(+0.84%) Conv2D_bf16_59 6253 6301 REGR(+0.77%)

I'm quite surprised about the results. Did you have time to investigate the regressions?

@krishnamtibrewala
Copy link
Collaborator Author

krishnamtibrewala commented Dec 3, 2024

QoR Results are as followed, there are few regression reset all benchmarks have same QoR
Select_aie2_bf16 409 440 REGR(+7.58%) BitwiseXor_aie2_int8 731 776 REGR(+6.16%) BilinearInterpolation_2 996 1018 REGR(+2.21%) BilinearInterpolation_3 996 1018 REGR(+2.21%) BilinearInterpolation_4 996 1018 REGR(+2.21%) BilinearInterpolation_0 780 794 REGR(+1.79%) Conv2D_bf16_2 19089 19281 REGR(+1.01%) Conv2D_bf16_5 19089 19281 REGR(+1.01%) Conv2D_bf16_8 20607 20799 REGR(+0.93%) BilinearInterpolation_1 474 478 REGR(+0.84%) Conv2D_bf16_59 6253 6301 REGR(+0.77%)

I'm quite surprised about the results. Did you have time to investigate the regressions?

With the latest changes + 2nd Commit ([AIE2] Update VLD Multi-Slot Itinerary) of the PR there are no regressions.
The issue was how Multi-slot latency Vector Load instruction latency was treated.

@@ -73,7 +73,7 @@ let isMoveImm = 1, isReMaterializable = 1, isAsCheapAsAMove = 1, Itinerary = II_

// Pseudo VLD
let hasSideEffects = false, mayLoad = true, mayStore = false in {
let Itinerary = II_VLDA_W in {
let Itinerary = II_VLDB in {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give more details about this change? Why does this impact the postmisched? It should not be impacted by the resources of the multi-slot itinerary, because when checking or adding to the scoreboard, the final itinerary should be used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants