-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIEX] Schedule SWP epilogue with "free" instructions #247
base: aie-public
Are you sure you want to change the base?
Conversation
a905dec
to
4776f1d
Compare
std::optional<unsigned> OptSrcCycle = | ||
InstrItins->getOperandCycle(SrcClass, OpNum); | ||
assert(OptSrcCycle); | ||
int Latency = *OptSrcCycle; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: const.
// We want to insert above it. | ||
return std::lower_bound(IsTopNode ? begin() : bottom(), | ||
IsTopNode ? top() : end(), *EmissionCycle, | ||
HasGreaterOrLessOrEqEmissionCycle); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah just split in two lower_bound calls.
IsPreRA(IsPreRA), SchedZone(SchedBoundary::BotQID, "Zone") {} | ||
IsPreRA(IsPreRA), | ||
SchedZone(IsTopDown ? SchedBoundary::TopQID : SchedBoundary::BotQID, | ||
"Zone") {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks as if "Zone" could be more descriptive
ArrayRef<MachineBundle> TopFixedBundles; | ||
ArrayRef<MachineBundle> TopFixedBundles = | ||
RegionBegin == BB->begin() ? ArrayRef<MachineBundle>(BS.TopInsert) | ||
: ArrayRef<MachineBundle>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check: TopFixedBundles was empty before, triggering no further action.
@@ -639,7 +640,7 @@ void AIEPostRASchedStrategy::commitBlockSchedule(MachineBasicBlock *BB) { | |||
|
|||
// Safety margin, swp epilogue | |||
// Note that the prologue is handled in a different way. See enterMBB. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment is out of date, we now only handle the safety margin here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious: Can we have both a safety margin and a top-fixed region? If not, can we assert it doesn't happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot have! If we need to supply safety margin for swp loop, it means an incorrect schedule. We cannot calculate the safety margin for swp-loop without triggering this assert:
llc: ../llvm/lib/Target/AIE/AIEInterBlockScheduling.cpp:903: auto llvm::AIE::InterBlockScheduling::getCyclesToRespectTiming(const llvm::AIE::BlockState &, const llvm::AIE::BlockState &)::(anonymous class)::operator()(const llvm::AIE::Region &) const: Assertion `R.top_fixed_instrs().empty() && "SWP epilogue already emitted?"' failed.
+ /scratch/llvm-aie/build-public-mem/bin/FileCheck /scratch/llvm-aie/llvm/test/CodeGen/AIE/aie2/schedule/postpipeliner/add-store.mir
} | ||
} | ||
return CurrMaxLatency; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for the future: It's not clear to me why we look at operand latencies in computeUseDefForward()
but not here. We could get more accurate numbers by using Itineratries here as well for each operand we visit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We conservatively assume that all operands are read on the first cycle, what is actually the case for most of instructions so the implementation is targeting mostly RAW dependencies that is the common case for merging epilogues. For WAR we are too conservative, but we can improve in a follow-up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For WAR, we also assume a write on the first cycle by not looking to itineraries, but this is more then safe because it pushes the instruction even more far away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good, I left lots of nits (as usual), but the only real problem is with memory dependencies. Once we solve that, I think it's good to go :)
45fe727
to
14a6f99
Compare
EXPECT_POS_IN_BB(Scheduler->bottom(), MBB, 4); | ||
|
||
// Insert MI1 in top-down cycle 2. This is less than MI3's cycle so MI1 | ||
// should actually be inserted aboce MI3. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: above
|
||
/// Schedule an instruction (MI3) in the Top zone. | ||
/// Then move MI1 above because it has a lower EmissionCycle. | ||
TEST_F(ScheduleDAGMITest, MoveInTopAboveDbg) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this unit-test test, that isn't covered by the previous one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed they are quite similar, but MI is above a dbg instruction. I would prefer to keep this test because the handling of debug instructions is a bit tricky ;-)
MachineBasicBlock *getLoopPredecessor(const MachineBasicBlock &MBB) { | ||
if (MBB.pred_size() == 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe rename the Argument to EpilogueMBB
and the function to getLoopBody
?
MachineBasicBlock *getLoopPredecessor(const MachineBasicBlock &MBB) { | ||
if (MBB.pred_size() == 1) { | ||
// if we have only one, it must be the loop | ||
return *MBB.predecessors().begin(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should you also check for isSingleMBBLoop(*MBB.predecessors().begin())
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the code to assert in both cases.
// Otherwise, the loop is the fallthrough predecessor by construction | ||
for (auto *Pred : MBB.predecessors()) { | ||
if (Pred->isLayoutSuccessor(&MBB)) { | ||
assert(isSingleMBBLoop(Pred) && "Layout predecessor is not a loop"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would we handle loops consisting of multiple MBBs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't handle this case. This function was refactored out from the InterBlockScheduler
, were we handle this specific case. This function is not intended for general use. I also updated the comment in LoopUtils.h
to make this a bit more clear.
@@ -784,43 +788,80 @@ void InterBlockScheduling::emitBundles( | |||
AIEHazardRecognizer::applyBundles(Bundles, BB); | |||
} | |||
|
|||
void InterBlockScheduling::emitInterBlockTop(const BlockState &BS) const { | |||
void InterBlockScheduling::emitTopSafetyMargin(const BlockState &BS) const { | |||
if (BS.Kind != BlockType::Epilogue) { | |||
return; | |||
} | |||
|
|||
MachineBasicBlock *BB = BS.TheBlock; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: rename BB
to EpiloqueMBB
to clarify the nature of the MachineBasicBlock
} | ||
|
||
MachineBasicBlock *BB = BS.TheBlock; | ||
MachineBasicBlock *Loop = AIELoopUtils::getLoopPredecessor(*BB); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The epiloque check and the Loop BlockState extraction is copied in emitTopSafetyMargin
, you may want to move it into a dedicated function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactored the code to remove this duplication.
HasLessOrEqEmissionCycle); | ||
else | ||
return std::lower_bound(bottom(), end(), *EmissionCycle, | ||
HasGreaterEmissionCycle); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check: we seem to search in the same direction, but we reverse the polarity of the comparison. Aren't we finding the last such cycle in one of the cases?
-- I guess that it is compensated by the fact that we have LessEq vs Greater. LessEq vs GreaterEq would have looked more symmetric.
HasGreaterEmissionCycle); | ||
// For top-down: less or equal emission cycle. | ||
auto HasLessOrEqEmissionCycle = [&](const MachineInstr &MI, | ||
unsigned EmissionCycle) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: reusing/hiding EmissionCycle for something with a different type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, indeed!
return; | ||
} | ||
|
||
MachineBasicBlock *BB = BS.TheBlock; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: rename BB
to EpiloqueBB
to clarify the nature of BB
// Trim excedent empty bundles. Empty TopInsert means 1-stage pipeline. | ||
if (!BS.TopInsert.empty()) { | ||
while (BS.TopInsert.back().empty()) { | ||
assert(BS.TopInsert.back().getMetaInstrs().empty()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can BS.TopInsert.back().empty
and !BS.TopInsert.empty()
be true at the same time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this situation we have a vector of bundles (actually, MachineBundle
). In this case, if our vector is not empty, we drop reversely all empty bundles until we find a non-empty one. We do this because SWP most of times extract a sequence of empty bundles in the top insert. It can be a bit confusing but BS.TopInsert.back().empty
is testing the last bundle of the vector.
In the past, without the epilog merging, those empty bundles helped to create a safety margin between the top-timed region and the epilogue instructions.
@@ -427,7 +427,8 @@ class InterBlockScheduling { | |||
|
|||
/// Emit extra code induced by interblock scheduling: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: introduced
// If the zone still has unscheduled fixed instructions, the next one to | ||
// pick is (DAG->bottom() - 1) for bottom-up, or DAG->top() for top-down. | ||
if (Zone.isTop()) { | ||
if (NumEmitted < Reg.getTopFixedBundles().size()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could we make this CurRegion, or something else that doesn't resemble a Register?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure!
14a6f99
to
67f4b21
Compare
Similar to what is done for bottom-up.
Only for free instructions.
This commit prepares to schedule top fixed bundles. We also create dedicated loop exits early, handling new blocks along with their corresponding block states.
If we have TopFixed instructions, we start top-down and we change to bottom-up when we fill as much as possible the slots related to those instructions. Special care is needed for instructions with delay slot and bottom-fixed instructions.
67f4b21
to
f4af451
Compare
Also add all related dependencies to have safety margins.
This is necessay because epilogue blocks of SWP loops are constrained between two already scheduled regions. When we can't se a full conflict horizon for the epilogue, we block suffucient cycles.
f4af451
to
e607af5
Compare
This PR adds support for Epilogue scheduling. In this way, it also adds:
I recommend reviewing the pull request in the order of commits, although some of them are closely related, so I plan to combine them in the future.
Ongoing work related to EmitFixedSUnits: we current add all WAR and RAW dependencies related to the top-insert and the rest. However, the bot-insert handling can be changed to use bot register events as well.