Introduce zero overhead loop #46

martien-de-jong · 2024-05-21T15:38:47Z

Final verdict: hwloop mostly causes significant PMsize expansion and frequent slight instruction count regressions (~ 10-100 cycles)

We have a few significant wins, e.g. GEMM_int8_1 InsnCount 55877 -> 53775
I didn't find any functional incorrectness and it is switched off by default.

I would like to commit now, and propose a follow-up to take the loop size into account at e.g. legalization time, where it is relatively easy to allocate a virtual loopcount register. This on the basis that we don't gain on big loops, since there the loop body is dominated by memory, move and vector instructions.

This is a direct port of Abnikant's original aie-private PR.
The representation of PseudoLoopEnd for analyzeBranch has changed significantly; We always push two components. The first is the opcode, the second the additional operand which can not be derived from the target block.

It should be noted that ZOL does not handle zero or negative loopcounts correctly. As such we need to establish that it is positive, e.g. by loop guarding or by interpreting pragma-like directives.

llvm/lib/Target/AIE/AIELegalizerInfo.cpp

llvm/lib/Target/AIE/AIEBaseInstrInfo.cpp

llvm/lib/Target/AIE/AIE2AsmPrinter.cpp

llvm/lib/Target/AIE/AIE2InstrInfo.cpp

llvm/lib/Target/AIE/AIE2InstructionSelector.cpp

gbossu · 2024-06-03T15:17:07Z

llvm/lib/Target/AIE/AIE2TargetTransformInfo.cpp

+    for (Instruction &I : *BB) {
+      if (isa<CallInst>(I) || isa<InvokeInst>(I)) {
+        if (const Function *F = cast<CallBase>(I).getCalledFunction()) {
+          if (!isLoweredToCall(F))


Curious: Where does isLoweredToCall come from? Is that a generic LLVM function?

It's generic, with a note that it should be moved to a target-specific hook.

I think we should have a custom implementation for this PR. See:

void sum(double *a, double *b, double *c) { for(int i = 0; i < 30; i++) { c[i] = a[i] + b[i]; } }

It should walk in sync with GISel legalization rules for libcalls.

llvm/test/CodeGen/AIE/aie2/hardware-loops/irtranslator-zol.ll

gbossu · 2024-06-03T15:20:13Z

llvm/test/CodeGen/AIE/aie2/hardware-loops/nested.ll

-; CHECK-NEXT:    nop
-; CHECK-NEXT:    nop
-; CHECK-NEXT:    mova r6, #0
-; CHECK-NEXT:    add.nc r5, r1, #-1


Do you understand the changes?

As far as I can see, profitability of low overhead loops was reduced to single block loops. I removed the issue-limit=1, which probably wasn't very clever for this particular test.

have restored issue-limit=1.

llvm/lib/Target/AIE/AIELegalizerInfo.cpp

gbossu · 2024-06-03T16:21:42Z

llvm/test/CodeGen/AIE/aie2/hardware-loops/legalize-zol.mir

-  ; CHECK-NEXT:   [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 1
-  ; CHECK-NEXT:   [[AND:%[0-9]+]]:_(s32) = G_AND [[ASSERT_ZEXT]], [[C3]]
-  ; CHECK-NEXT:   G_BRCOND [[AND]](s32), %bb.2
+  ; CHECK-NEXT:   G_BRCOND [[ASSERT_ZEXT]](s32), %bb.2


You did not change the legalizer, do you know why that test needed to be updated?
~~Edit: I didn't see it's not your change. But I think it still makes sense to move that diff to the first commit if that's where it belongs.~~

I have assumed that the bit analysis has improved in upstream llvm, and now recognises that booleans are inrange.

llvm/lib/Target/AIE/AIE2AsmPrinter.cpp

llvm/lib/Target/AIE/AIE2RegisterInfo.cpp

llvm/lib/Target/AIE/AIEBaseHardwareLoops.cpp

andcarminati · 2024-06-04T12:59:38Z

Hi, the following code can cause some problem in this PR:

target datalayout = "e-m:e-p:20:32-i1:8:32-i8:8:32-i16:16:32-i32:32:32-f32:32:32-i64:32-f64:32-a:0:32-n32"
target triple = "aie2"

define i32 @main() {
entry:
  br label %for.body

for.body:                                  ; preds = %for.body, %entry
  %i = phi i16 [ 0, %entry ], [ 1, %for.body ]
  %cmp = icmp ult i16 %i, 1
  br i1 %cmp, label %for.body, label %if.end

if.end:                                     ; preds = %for.body
  ret i32 0
}

With:

llc --march=aie2 sample.ll  --enable-aie-hardware-loops  --enable-aie-zero-overhead-loops

Gives:

llc: ../llvm/include/llvm/ADT/ilist_iterator.h:138: llvm::ilist_iterator::reference llvm::ilist_iterator<llvm::ilist_detail::node_options<llvm::MachineInstr, true, true, void, false>, false, false>::operator*() const [OptionsT = llvm::ilist_detail::node_options<llvm::MachineInstr, true, true, void, false>, IsReverse = false, IsConst = false]: Assertion `!NodePtr->isKnownSentinel()' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: ./bin/llc --march=aie2 reduced.ll --enable-aie-hardware-loops --enable-aie-zero-overhead-loops
1.      Running pass 'Function Pass Manager' on module 'reduced.ll'.
2.      Running pass 'InstructionSelect' on function '@main'
 #0 0x0000000005d38f97 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) ./../llvm/lib/Support/Unix/Signals.inc:723:13
 #1 0x0000000005d36f70 llvm::sys::RunSignalHandlers() ./../llvm/lib/Support/Signals.cpp:106:18
 #2 0x0000000005d3964a SignalHandler(int) ./../llvm/lib/Support/Unix/Signals.inc:413:1
 #3 0x00007f6f6c1a5520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
 #4 0x00007f6f6c1f99fc __pthread_kill_implementation ./nptl/pthread_kill.c:44:76
 #5 0x00007f6f6c1f99fc __pthread_kill_internal ./nptl/pthread_kill.c:78:10
 #6 0x00007f6f6c1f99fc pthread_kill ./nptl/pthread_kill.c:89:10
 #7 0x00007f6f6c1a5476 gsignal ./signal/../sysdeps/posix/raise.c:27:6
 #8 0x00007f6f6c18b7f3 abort ./stdlib/abort.c:81:7
 #9 0x00007f6f6c18b71b _nl_load_domain ./intl/loadmsgcat.c:1177:9
#10 0x00007f6f6c19ce96 (/lib/x86_64-linux-gnu/libc.so.6+0x39e96)

llvm/lib/Target/AIE/AIE2InstructionSelector.cpp

andcarminati · 2024-06-04T15:44:21Z

Hi @martien-de-jong , another interesting case (sample.ll):

target datalayout = "e-m:e-p:20:32-i1:8:32-i8:8:32-i16:16:32-i32:32:32-f32:32:32-i64:32-f64:32-a:0:32-n32"
target triple = "aie2"

define i32 @main() {
entry:
  br label %for.body

for.body:                                     ; preds = %for.body, %entry
  %i = phi i32 [ 0, %entry ], [ %inc, %for.body ]
  %inc = add i32 %i, 1
  %exitcond = icmp eq i32 %inc, 0
  br i1 %exitcond, label %label, label %for.body

label:                                     ; preds = %label, %for.body
  br label %label
}

However, you need the following options (Elf emission, no loop scheduling):

llc --march=aie2 sample.ll  --enable-aie-hardware-loops  --enable-aie-zero-overhead-loops -aie-loop-aware=false -filetype=obj

Result:

<unknown>:0: error: Undefined temporary symbol .L_1120

gbossu · 2024-06-24T07:43:28Z

llvm/test/CodeGen/AIE/aie2/hardware-loops/endlabel.mir

@@ -0,0 +1,34 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py


gbossu · 2024-06-24T07:48:08Z

llvm/test/CodeGen/AIE/aie2/hardware-loops/noduplication.mir

@@ -0,0 +1,152 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 4
+# RUN: llc -mtriple=aie2 --start-after=instruction-select \
+# RUN:   --stop-before=aie-finalize-mi-bundles %s -o - | FileCheck %s


Super-Nit: I'm changing the scheduler so it outputs "correct" Bundles. Could you change that line to --stop-after=aie-finalize-mi-bundles? This way there won't be test updates.

llvm/test/CodeGen/AIE/aie2/hardware-loops/complex-flow.mir

gbossu · 2024-06-24T07:58:40Z

llvm/lib/Target/AIE/AIEBaseInstrInfo.cpp

      Cond.push_back(MachineOperand::CreateImm(I->getOpcode()));
-      Cond.push_back(I->getOperand(0));


Did that hurt to keep Cond.push_back(I->getOperand(0));? I'm still struggling to understand what kind of API analyzeBranch has.

The api is that the target can push whatever it needs to reconstruct/invert a branch. We don't actually need that third operand, and I like occam's razor.

gbossu · 2024-06-24T08:44:59Z

llvm/lib/Target/AIE/AIEBaseInstrInfo.cpp

+  if (isHardwareLoopEnd(Opc)) {
+    CBranchBuilder.addMBB(TBB).add(Cond[1]);
+  } else {
+    CBranchBuilder.add(Cond[1]).addMBB(TBB);


Super-nit: Maybe we could define PseudoLoopEnd to have the same operand order as other branches?

llvm/lib/Target/AIE/AIEBaseInstrInfo.cpp

llvm/test/CodeGen/AIE/aie2/hardware-loops/zol-loop.mir

gbossu

I'm done with the review. I think it looks good! Please go through the remaining comments, I'd be happy if some tests are moved around, but it's not such a big deal :)

andcarminati

LGTM. Nice work, credits to you and the origin author.

lower symbol in MC lowering

Also make PseudoLoopEnd a meta instruction to simplify emit logic Make sure LoopStart/LoopEnd don't get duplicated in e.g. TailDuplication

PseudoLoopEnd is very similar to a regular conditional branch. We need two Cond elements in order to reconstruct the instruction, one is the opcode, the other is the condition register for JZ/JNZ and the last-bundle label for PseudoLoopEnd The operand order of PseudoLoopEnd was swapped to make it congruent to the other conditonal branches insertBranch needs to generate unconditional branch for FBB even after PseudoLoopEnd.

Completely remove empty ZOL

konstantinschwarz · 2024-07-08T17:35:07Z

llvm/test/CodeGen/AIE/aie2/hardware-loops/instruction-select-zol-end.mir

+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
+# RUN: llc -O2 -mtriple=aie2 -run-pass=instruction-select %s -verify-machineinstrs -o - | FileCheck %s
+
+---


A couple of tests here are missing the license & copyright header. Could you please add those in a fixup PR @martien-de-jong?

martien-de-jong marked this pull request as draft May 21, 2024 15:39

martien-de-jong commented May 21, 2024

View reviewed changes

llvm/lib/Target/AIE/AIELegalizerInfo.cpp Outdated Show resolved Hide resolved

martien-de-jong commented May 21, 2024

View reviewed changes

llvm/lib/Target/AIE/AIEBaseInstrInfo.cpp Outdated Show resolved Hide resolved

martien-de-jong force-pushed the hwl-public branch from fa781f4 to 0fd51cd Compare May 21, 2024 16:17

gbossu reviewed May 27, 2024

View reviewed changes

llvm/lib/Target/AIE/AIE2AsmPrinter.cpp Outdated Show resolved Hide resolved

gbossu reviewed May 27, 2024

View reviewed changes

llvm/lib/Target/AIE/AIE2AsmPrinter.cpp Outdated Show resolved Hide resolved

gbossu reviewed May 27, 2024

View reviewed changes

llvm/lib/Target/AIE/AIE2InstrInfo.cpp Outdated Show resolved Hide resolved

gbossu reviewed May 27, 2024

View reviewed changes

llvm/lib/Target/AIE/AIE2InstructionSelector.cpp Outdated Show resolved Hide resolved

gbossu reviewed May 27, 2024

View reviewed changes

llvm/lib/Target/AIE/AIE2InstructionSelector.cpp Outdated Show resolved Hide resolved

martien-de-jong commented May 27, 2024

View reviewed changes

llvm/lib/Target/AIE/AIE2InstructionSelector.cpp Outdated Show resolved Hide resolved

martien-de-jong force-pushed the hwl-public branch from 0fd51cd to 3ba34e1 Compare May 31, 2024 09:32

martien-de-jong commented May 31, 2024

View reviewed changes

llvm/lib/Target/AIE/AIE2InstructionSelector.cpp Outdated Show resolved Hide resolved

martien-de-jong marked this pull request as ready for review May 31, 2024 13:28

gbossu reviewed Jun 3, 2024

View reviewed changes

llvm/test/CodeGen/AIE/aie2/hardware-loops/irtranslator-zol.ll Show resolved Hide resolved

gbossu reviewed Jun 3, 2024

View reviewed changes

llvm/lib/Target/AIE/AIELegalizerInfo.cpp Show resolved Hide resolved

gbossu reviewed Jun 3, 2024

View reviewed changes

llvm/lib/Target/AIE/AIE2AsmPrinter.cpp Outdated Show resolved Hide resolved

gbossu reviewed Jun 4, 2024

View reviewed changes

llvm/lib/Target/AIE/AIE2AsmPrinter.cpp Outdated Show resolved Hide resolved

gbossu reviewed Jun 4, 2024

View reviewed changes

llvm/lib/Target/AIE/AIE2RegisterInfo.cpp Show resolved Hide resolved

gbossu reviewed Jun 4, 2024

View reviewed changes

llvm/lib/Target/AIE/AIEBaseHardwareLoops.cpp Show resolved Hide resolved

andcarminati reviewed Jun 4, 2024

View reviewed changes

llvm/lib/Target/AIE/AIE2InstructionSelector.cpp Outdated Show resolved Hide resolved

martien-de-jong force-pushed the hwl-public branch from 3ba34e1 to 148cd00 Compare June 5, 2024 15:55

martien-de-jong requested review from abhinay-anubola, abnikant, khallouh and konstantinschwarz as code owners June 5, 2024 15:55

gbossu reviewed Jun 24, 2024

View reviewed changes

llvm/test/CodeGen/AIE/aie2/hardware-loops/complex-flow.mir Show resolved Hide resolved

gbossu reviewed Jun 24, 2024

View reviewed changes

llvm/lib/Target/AIE/AIEBaseInstrInfo.cpp Show resolved Hide resolved

gbossu reviewed Jun 24, 2024

View reviewed changes

llvm/test/CodeGen/AIE/aie2/hardware-loops/zol-loop.mir Show resolved Hide resolved

gbossu reviewed Jun 24, 2024

View reviewed changes

martien-de-jong force-pushed the hwl-public branch from f67f04f to 8bbddd2 Compare June 27, 2024 12:28

gbossu previously approved these changes Jun 27, 2024

View reviewed changes

andcarminati previously approved these changes Jun 28, 2024

View reviewed changes

martien-de-jong dismissed stale reviews from andcarminati and gbossu via 2c74d2a June 28, 2024 12:29

martien-de-jong force-pushed the hwl-public branch from 8bbddd2 to 2c74d2a Compare June 28, 2024 12:29

Martien de Jong and others added 10 commits June 28, 2024 14:36

[AIE] Update HW-Loop Profitability

c5d7c4f

[AIE2] Legalize loop_decrement

0febff8

[AIE2] Instruction-select ZOL end

f9324ec

[AIE] emit last-bundle symbol for ZOL

8385f2b

lower symbol in MC lowering

[AIE] Several unrelated clang-tidies

1e3296e

[AIE2] PseudoLoopEnd is a scheduler barrier

ee4a562

Also make PseudoLoopEnd a meta instruction to simplify emit logic Make sure LoopStart/LoopEnd don't get duplicated in e.g. TailDuplication

[AIE2] Mark loop registers as reserved

d2f1822

[AIE] Late expansion of ZOL constructs

53e3b0c

Completely remove empty ZOL

[AIE] Pad to 112 bytes after loop start

12fc40d

martien-de-jong force-pushed the hwl-public branch from 2c74d2a to 12fc40d Compare June 28, 2024 13:06

gbossu approved these changes Jun 28, 2024

View reviewed changes

andcarminati approved these changes Jun 28, 2024

View reviewed changes

martien-de-jong enabled auto-merge (rebase) June 28, 2024 13:56

martien-de-jong merged commit 8185e31 into aie-public Jun 28, 2024
8 checks passed

konstantinschwarz reviewed Jul 8, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce zero overhead loop #46

Introduce zero overhead loop #46

martien-de-jong commented May 21, 2024 •

edited

Loading

gbossu Jun 3, 2024

martien-de-jong Jun 3, 2024

andcarminati Jun 4, 2024

gbossu Jun 3, 2024

martien-de-jong Jun 4, 2024

martien-de-jong Jun 12, 2024

gbossu Jun 3, 2024 •

edited

Loading

martien-de-jong Jun 4, 2024

andcarminati commented Jun 4, 2024

andcarminati commented Jun 4, 2024

gbossu Jun 24, 2024

gbossu Jun 24, 2024

gbossu Jun 24, 2024

martien-de-jong Jun 27, 2024

gbossu Jun 24, 2024

martien-de-jong Jun 27, 2024

gbossu left a comment

andcarminati left a comment

konstantinschwarz Jul 8, 2024

		@@ -0,0 +1,34 @@
		# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py

		Cond.push_back(MachineOperand::CreateImm(I->getOpcode()));
		Cond.push_back(I->getOperand(0));

Introduce zero overhead loop #46

Introduce zero overhead loop #46

Conversation

martien-de-jong commented May 21, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gbossu Jun 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andcarminati commented Jun 4, 2024

andcarminati commented Jun 4, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gbossu left a comment

Choose a reason for hiding this comment

andcarminati left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martien-de-jong commented May 21, 2024 •

edited

Loading

gbossu Jun 3, 2024 •

edited

Loading