Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix performance regression for device compileAndRun #63

Merged
merged 1 commit into from
Dec 12, 2024

Conversation

ggeorgakoudis
Copy link
Collaborator

  • Extract runtime constant types in the pass
  • Avoid bitcode parsing unless compiling
  • Link binaries for CUDA RDC in object codegen
  • Add kernel repeat test

Closes #59

- Extract runtime constant types in the pass
- Avoid bitcode parsing unless compiling
- Link binaries for CUDA RDC in object codegen
- Add kernel repeat test
Copy link
Collaborator

@johnbowen42 johnbowen42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. This improves not only performance but also readability and simplifies the code IMO.

namespace proteus {

enum RuntimeConstantTypes : int32_t {
BOOL = 1,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great addition, I should've done this in the first place

@@ -14,6 +14,19 @@
#include <cstring>
#include <stdint.h>

namespace proteus {

enum RuntimeConstantTypes : int32_t {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why inherit from, int32_t, just to enforce size?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, to match the expected data type from the pass

auto getNumRCs() const { return NumRCs; }
JITKernelInfo(char const *Name, int32_t *RCIndices, int32_t *RCTypes,
int32_t NumRCs)
: Name(Name), RCIndices{ArrayRef{RCIndices, static_cast<size_t>(NumRCs)}},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice use of the constructors here

Copy link
Collaborator

@tbennun tbennun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, solves my performance issue as well.

@ggeorgakoudis ggeorgakoudis merged commit 7141043 into main Dec 12, 2024
12 checks passed
@ggeorgakoudis ggeorgakoudis deleted the fix-performance-regression branch December 12, 2024 23:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Performance regression for overhead in compileAndRun
3 participants