Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIEX] Schedule SWP epilogue with "free" instructions #247

Open
wants to merge 8 commits into
base: aie-public
Choose a base branch
from

Conversation

andcarminati
Copy link
Collaborator

@andcarminati andcarminati commented Dec 10, 2024

This PR adds support for Epilogue scheduling. In this way, it also adds:

  • Support for top-down scheduling with explicit emission cycle.
  • Top-down logic for AIEMachineScheduler.

I recommend reviewing the pull request in the order of commits, although some of them are closely related, so I plan to combine them in the future.

Ongoing work related to EmitFixedSUnits: we current add all WAR and RAW dependencies related to the top-insert and the rest. However, the bot-insert handling can be changed to use bot register events as well.

std::optional<unsigned> OptSrcCycle =
InstrItins->getOperandCycle(SrcClass, OpNum);
assert(OptSrcCycle);
int Latency = *OptSrcCycle;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: const.

// We want to insert above it.
return std::lower_bound(IsTopNode ? begin() : bottom(),
IsTopNode ? top() : end(), *EmissionCycle,
HasGreaterOrLessOrEqEmissionCycle);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah just split in two lower_bound calls.

IsPreRA(IsPreRA), SchedZone(SchedBoundary::BotQID, "Zone") {}
IsPreRA(IsPreRA),
SchedZone(IsTopDown ? SchedBoundary::TopQID : SchedBoundary::BotQID,
"Zone") {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks as if "Zone" could be more descriptive

ArrayRef<MachineBundle> TopFixedBundles;
ArrayRef<MachineBundle> TopFixedBundles =
RegionBegin == BB->begin() ? ArrayRef<MachineBundle>(BS.TopInsert)
: ArrayRef<MachineBundle>();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check: TopFixedBundles was empty before, triggering no further action.

@@ -639,7 +640,7 @@ void AIEPostRASchedStrategy::commitBlockSchedule(MachineBasicBlock *BB) {

// Safety margin, swp epilogue
// Note that the prologue is handled in a different way. See enterMBB.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment is out of date, we now only handle the safety margin here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious: Can we have both a safety margin and a top-fixed region? If not, can we assert it doesn't happen?

Copy link
Collaborator Author

@andcarminati andcarminati Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot have! If we need to supply safety margin for swp loop, it means an incorrect schedule. We cannot calculate the safety margin for swp-loop without triggering this assert:

llc: ../llvm/lib/Target/AIE/AIEInterBlockScheduling.cpp:903: auto llvm::AIE::InterBlockScheduling::getCyclesToRespectTiming(const llvm::AIE::BlockState &, const llvm::AIE::BlockState &)::(anonymous class)::operator()(const llvm::AIE::Region &) const: Assertion `R.top_fixed_instrs().empty() && "SWP epilogue already emitted?"' failed.
+ /scratch/llvm-aie/build-public-mem/bin/FileCheck /scratch/llvm-aie/llvm/test/CodeGen/AIE/aie2/schedule/postpipeliner/add-store.mir

}
}
return CurrMaxLatency;
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for the future: It's not clear to me why we look at operand latencies in computeUseDefForward() but not here. We could get more accurate numbers by using Itineratries here as well for each operand we visit.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We conservatively assume that all operands are read on the first cycle, what is actually the case for most of instructions so the implementation is targeting mostly RAW dependencies that is the common case for merging epilogues. For WAR we are too conservative, but we can improve in a follow-up PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For WAR, we also assume a write on the first cycle by not looking to itineraries, but this is more then safe because it pushes the instruction even more far away.

Copy link
Collaborator

@gbossu gbossu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good, I left lots of nits (as usual), but the only real problem is with memory dependencies. Once we solve that, I think it's good to go :)

@andcarminati andcarminati force-pushed the andreu.swp.epilogue.scheduling branch 4 times, most recently from 45fe727 to 14a6f99 Compare January 8, 2025 17:11
EXPECT_POS_IN_BB(Scheduler->bottom(), MBB, 4);

// Insert MI1 in top-down cycle 2. This is less than MI3's cycle so MI1
// should actually be inserted aboce MI3.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: above


/// Schedule an instruction (MI3) in the Top zone.
/// Then move MI1 above because it has a lower EmissionCycle.
TEST_F(ScheduleDAGMITest, MoveInTopAboveDbg) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this unit-test test, that isn't covered by the previous one?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed they are quite similar, but MI is above a dbg instruction. I would prefer to keep this test because the handling of debug instructions is a bit tricky ;-)

Comment on lines 119 to 120
MachineBasicBlock *getLoopPredecessor(const MachineBasicBlock &MBB) {
if (MBB.pred_size() == 1) {
Copy link
Collaborator

@F-Stuckmann F-Stuckmann Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe rename the Argument to EpilogueMBB and the function to getLoopBody?

MachineBasicBlock *getLoopPredecessor(const MachineBasicBlock &MBB) {
if (MBB.pred_size() == 1) {
// if we have only one, it must be the loop
return *MBB.predecessors().begin();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you also check for isSingleMBBLoop(*MBB.predecessors().begin())?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the code to assert in both cases.

// Otherwise, the loop is the fallthrough predecessor by construction
for (auto *Pred : MBB.predecessors()) {
if (Pred->isLayoutSuccessor(&MBB)) {
assert(isSingleMBBLoop(Pred) && "Layout predecessor is not a loop");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would we handle loops consisting of multiple MBBs?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't handle this case. This function was refactored out from the InterBlockScheduler, were we handle this specific case. This function is not intended for general use. I also updated the comment in LoopUtils.h to make this a bit more clear.

@@ -784,43 +788,80 @@ void InterBlockScheduling::emitBundles(
AIEHazardRecognizer::applyBundles(Bundles, BB);
}

void InterBlockScheduling::emitInterBlockTop(const BlockState &BS) const {
void InterBlockScheduling::emitTopSafetyMargin(const BlockState &BS) const {
if (BS.Kind != BlockType::Epilogue) {
return;
}

MachineBasicBlock *BB = BS.TheBlock;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: rename BB to EpiloqueMBB to clarify the nature of the MachineBasicBlock

}

MachineBasicBlock *BB = BS.TheBlock;
MachineBasicBlock *Loop = AIELoopUtils::getLoopPredecessor(*BB);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The epiloque check and the Loop BlockState extraction is copied in emitTopSafetyMargin, you may want to move it into a dedicated function

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored the code to remove this duplication.

HasLessOrEqEmissionCycle);
else
return std::lower_bound(bottom(), end(), *EmissionCycle,
HasGreaterEmissionCycle);
Copy link
Collaborator

@martien-de-jong martien-de-jong Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check: we seem to search in the same direction, but we reverse the polarity of the comparison. Aren't we finding the last such cycle in one of the cases?
-- I guess that it is compensated by the fact that we have LessEq vs Greater. LessEq vs GreaterEq would have looked more symmetric.

HasGreaterEmissionCycle);
// For top-down: less or equal emission cycle.
auto HasLessOrEqEmissionCycle = [&](const MachineInstr &MI,
unsigned EmissionCycle) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: reusing/hiding EmissionCycle for something with a different type

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, indeed!

return;
}

MachineBasicBlock *BB = BS.TheBlock;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: rename BB to EpiloqueBB to clarify the nature of BB

// Trim excedent empty bundles. Empty TopInsert means 1-stage pipeline.
if (!BS.TopInsert.empty()) {
while (BS.TopInsert.back().empty()) {
assert(BS.TopInsert.back().getMetaInstrs().empty());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can BS.TopInsert.back().emptyand !BS.TopInsert.empty() be true at the same time?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this situation we have a vector of bundles (actually, MachineBundle). In this case, if our vector is not empty, we drop reversely all empty bundles until we find a non-empty one. We do this because SWP most of times extract a sequence of empty bundles in the top insert. It can be a bit confusing but BS.TopInsert.back().empty is testing the last bundle of the vector.

In the past, without the epilog merging, those empty bundles helped to create a safety margin between the top-timed region and the epilogue instructions.

@@ -427,7 +427,8 @@ class InterBlockScheduling {

/// Emit extra code induced by interblock scheduling:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: introduced

// If the zone still has unscheduled fixed instructions, the next one to
// pick is (DAG->bottom() - 1) for bottom-up, or DAG->top() for top-down.
if (Zone.isTop()) {
if (NumEmitted < Reg.getTopFixedBundles().size()) {
Copy link
Collaborator

@martien-de-jong martien-de-jong Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could we make this CurRegion, or something else that doesn't resemble a Register?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

@andcarminati andcarminati force-pushed the andreu.swp.epilogue.scheduling branch from 14a6f99 to 67f4b21 Compare January 13, 2025 16:19
This commit prepares to schedule top fixed bundles. We also create dedicated
loop exits early, handling new blocks along with their corresponding block states.
If we have TopFixed instructions, we start top-down and we change
to bottom-up when we fill as much as possible the slots related
to those instructions. Special care is needed for instructions
with delay slot and bottom-fixed instructions.
@andcarminati andcarminati force-pushed the andreu.swp.epilogue.scheduling branch from 67f4b21 to f4af451 Compare January 15, 2025 13:30
Also add all related dependencies to have safety margins.
This is necessay because epilogue blocks of SWP loops are constrained
between two already scheduled regions. When we can't se a full conflict
horizon for the epilogue, we block suffucient cycles.
@andcarminati andcarminati force-pushed the andreu.swp.epilogue.scheduling branch from f4af451 to e607af5 Compare January 15, 2025 13:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants