Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add InterBlock loop/epilogue analysis #51

Merged
merged 4 commits into from
Jun 11, 2024

Conversation

andcarminati
Copy link
Collaborator

Implement loop/epilogue analysis to reduce pessimism related to the propagation of loop latencies and resource conflicts to epilogue.
This PR adds:

  • MemoryEdges update to ignore bundles.
  • Latency analysis using a DDG.
  • Conflict analysis using two scoreboards.
  • Code refactoring.
  • Test additions/updates.

@andcarminati
Copy link
Collaborator Author

Benchmark evaluation:

With option --aie-loop-min-tripcount=4

First results are without the change and the second are with.

+----------------+-------------------+------------------+--+-------------------+------------------+--+--------------------+-------------------+
|     Design     | Total Cycle Count | Total Insn Count |  | Total Cycle Count | Total Insn Count |  | % Dec. Cycle Count | % Dec. Insn Count |
+----------------+-------------------+------------------+--+-------------------+------------------+--+--------------------+-------------------+
| GEMM_bf16_0    |             10994 |             4670 |  |             10898 |             4574 |  | 0,87%              | 2,06%             |
| GEMV_0         |              3163 |              689 |  |              3163 |              689 |  | 0,00%              | 0,00%             |
| Mul2D_0        |              4807 |             1643 |  |              4807 |             1643 |  | 0,00%              | 0,00%             |
| AvgPool2D_0    |              5153 |             3299 |  |              5153 |             3299 |  | 0,00%              | 0,00%             |
| Pad2D_0        |              4522 |             2190 |  |              4522 |             2190 |  | 0,00%              | 0,00%             |
| MaxPool2D_0    |              3832 |             2483 |  |              3832 |             2483 |  | 0,00%              | 0,00%             |
| GEMV_1         |              1969 |              587 |  |              1969 |              587 |  | 0,00%              | 0,00%             |
| Add2D_0        |              8157 |             3478 |  |              8157 |             3478 |  | 0,00%              | 0,00%             |
| Conv2D_0       |             20649 |            10943 |  |             20239 |            10533 |  | 1,99%              | 3,75%             |
| Conv2D_ReLU_0  |              3391 |             1716 |  |              3331 |             1656 |  | 1,77%              | 3,50%             |
| Conv2D_LReLU_0 |              4768 |             3094 |  |              4708 |             3034 |  | 1,26%              | 1,94%             |
+----------------+-------------------+------------------+--+-------------------+------------------+--+--------------------+-------------------+


Copy link
Collaborator

@martien-de-jong martien-de-jong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing wrong. Just a few nits you may want to consider.

@andcarminati andcarminati force-pushed the andreu.me.loop.epilogue branch from 62899cc to d74b615 Compare May 31, 2024 16:19
AddRegionToEdges(LoopBS.getBottom());
Edges.markBoundary();
// Second part is the epilogue itself
AddRegionToEdges(EpilogueBS.getTop());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Maybe also add the ExitSU node of EpilogueBS to the map of distances so we can avoid the if (Succ->isBoundaryNode()) corner case in the loop below.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current implementation, this mapping maps instructions to depths, in this way we need to change the mapping logic as well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant is something like: DistancesFromLoopEntry[EpilogueBS.getTop().getExitSU()] = DistFromLoopEntry;, this way there is no if (Succ->isBoundaryNode()) special casing in the loop below.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I simplified a bit more the code.

Copy link
Collaborator

@gbossu gbossu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, I think the last remaining thing would be to fix the assertion that now triggers in the RegionEndEdges mutator for some benchmarks, and add a small reproducer test for it.

On top of that, please look at the Nit comments and address up to taste :) Generally, I'd be happy if we can get rid of the if (Succ->isBoundaryNode()) special casing.

@andcarminati andcarminati force-pushed the andreu.me.loop.epilogue branch 2 times, most recently from 0f3a798 to 7c6d248 Compare June 4, 2024 09:32
@andcarminati
Copy link
Collaborator Author

Hi @martien-de-jong and @gbossu, I addressed all the comments and suggestions. Benchmarks are running again with no problems. I you can take a look, it would be nice.

This change is necessary to run correctly mutations after bundling.
@andcarminati
Copy link
Collaborator Author

Here the most recent results:


+----------------+---+------------+---------------+-------------+---+---+------------+---------------+-------------+---+---+-------------------+----------------+--------------------+
|       #        | # | Without EA |       #       |      #      | # | # |  With EA   |       #       |      #      | # | # |       Comp.       |       #        |         #          |
+----------------+---+------------+---------------+-------------+---+---+------------+---------------+-------------+---+---+-------------------+----------------+--------------------+
| Design         |   | Tot. PM S. | Tot. Cycle C. | Tot. Ins C. |   |   | Tot. PM S. | Tot. Cycle C. | Tot. Ins C. |   |   | % Diff Tot. PM S. | % Diff Tot. C. | % Diff Tot. Ins C. |
| GEMM_bf16_0    |   | 4160       | 10461         | 4160        |   |   | 4144       | 10365         | 4064        |   |   | -0,38%            | -0,92%         | -2,31%             |
| GEMV_0         |   | 3488       | 3126          | 650         |   |   | 3488       | 3126          | 650         |   |   | 0,00%             | 0,00%          | 0,00%              |
| Mul2D_0        |   | 2672       | 4857          | 1686        |   |   | 2672       | 4857          | 1686        |   |   | 0,00%             | 0,00%          | 0,00%              |
| AvgPool2D_0    |   | 3552       | 4159          | 2811        |   |   | 3552       | 4159          | 2811        |   |   | 0,00%             | 0,00%          | 0,00%              |
| Pad2D_0        |   | 3456       | 4512          | 2184        |   |   | 3456       | 4512          | 2184        |   |   | 0,00%             | 0,00%          | 0,00%              |
| MaxPool2D_0    |   | 3344       | 3798          | 2449        |   |   | 3344       | 3798          | 2449        |   |   | 0,00%             | 0,00%          | 0,00%              |
| GEMV_1         |   | 3488       | 1948          | 560         |   |   | 3488       | 1948          | 560         |   |   | 0,00%             | 0,00%          | 0,00%              |
| Add2D_0        |   | 3600       | 8157          | 3478        |   |   | 3600       | 8157          | 3478        |   |   | 0,00%             | 0,00%          | 0,00%              |
| Conv2D_0       |   | 5952       | 19018         | 10543       |   |   | 5920       | 18538         | 10063       |   |   | -0,54%            | -2,52%         | -4,55%             |
| Conv2D_ReLU_0  |   | 5328       | 3377          | 1704        |   |   | 5296       | 3317          | 1644        |   |   | -0,60%            | -1,78%         | -3,52%             |
| Conv2D_LReLU_0 |   | 6000       | 4767          | 3095        |   |   | 5968       | 4707          | 3035        |   |   | -0,53%            | -1,26%         | -1,94%             |
+----------------+---+------------+---------------+-------------+---+---+------------+---------------+-------------+---+---+-------------------+----------------+--------------------+


while (Bottom.conflict(Top, Depth)) {
Bottom.advance();
NopCounter++;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: That looks a lot like AIEPostRASchedStrategy::handleRegionConflicts, maybe there's code we can share.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @gbossu, I think it could be a bit complicated, because handleRegionConflicts uses two hazard recognizers, while we use just one, plus scoreboard comparison. I am afraid that a refactor could create a more confusing code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation is different, but ultimately the goal is the same, we want to insert NOPs until there is no resource hazard. We can do that in a follow-up PR.

gbossu
gbossu previously approved these changes Jun 6, 2024
Copy link
Collaborator

@gbossu gbossu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I think it's great work!

Nit: I'd be even happier if we could remove the if (Succ->isBoundaryNode()) special case in getCyclesToRespectTiming(), maybe by extending the map of depths, maybe by using a conditional assignment instead.

@andcarminati
Copy link
Collaborator Author

LGTM, I think it's great work!

Nit: I'd be even happier if we could remove the if (Succ->isBoundaryNode()) special case in getCyclesToRespectTiming(), maybe by extending the map of depths, maybe by using a conditional assignment instead.

Thank you for this suggestion, I addressed this special case.

gbossu
gbossu previously approved these changes Jun 10, 2024
Copy link
Collaborator

@gbossu gbossu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

continue;
const MachineInstr *PostBoundaryMI = Succ->getInstr();

int PostBondOrExitDist =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit PostBoundOrExitDist

while (Bottom.conflict(Top, Depth)) {
Bottom.advance();
NopCounter++;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation is different, but ultimately the goal is the same, we want to insert NOPs until there is no resource hazard. We can do that in a follow-up PR.

Implement loop/epilog analysis to reduce pessimism related to the
propagation of loop latencies and resource conflicts to the epilogue.

This commit adds:
  * Latency analysis using a DDG.
  * Conflict analysis using two scoreboards.
  * Code refactoring.
  * Test additions/updates.
Now, this mutation can be applied to already bundled instructions
without misclassification. It means that it is safe to run this mutation
with Epilogue scheduling, for example.
If cast is necessary, MaxLatencyFinder will do it anyway.
@andcarminati andcarminati force-pushed the andreu.me.loop.epilogue branch from ab33106 to 831ac27 Compare June 11, 2024 09:09
Copy link
Collaborator

@martien-de-jong martien-de-jong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@andcarminati andcarminati merged commit ebbc1af into aie-public Jun 11, 2024
8 checks passed
@gbossu gbossu deleted the andreu.me.loop.epilogue branch June 14, 2024 08:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants