-
Notifications
You must be signed in to change notification settings - Fork 481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In-Place Dense Matrix Transposition #2199
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool, thanks for the PR, and finding bugs in my transpose in place.
I have only a few comments.
src/main/java/org/apache/sysds/runtime/matrix/data/LibMatrixReorg.java
Outdated
Show resolved
Hide resolved
src/main/java/org/apache/sysds/runtime/matrix/data/LibMatrixReorg.java
Outdated
Show resolved
Hide resolved
.../java/org/apache/sysds/test/component/matrix/libMatrixReorg/TransposeInPlaceBrennerTest.java
Outdated
Show resolved
Hide resolved
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2199 +/- ##
============================================
+ Coverage 71.88% 72.30% +0.42%
- Complexity 44701 45023 +322
============================================
Files 1449 1452 +3
Lines 169182 169417 +235
Branches 32980 33059 +79
============================================
+ Hits 121617 122498 +881
+ Misses 38237 37602 -635
+ Partials 9328 9317 -11 ☔ View full report in Codecov by Sentry. |
LGTM - thanks for the new kernel @jessicapriebe. I'll merge it in. |
DIA WiSe 24/25 project Closes apache#2199.
Added a new kernel for In-Place Dense Matrix Transposition, based on Algorithm 467 by Brenner (DOI: 10.1145/355611.362542).
Performance:
Compared to the existing kernel, the added method provides significant performance benefits in a single-threaded context:
Note:
Performance measurements were restricted to cases where the existing kernel yields correct results. Similar or even better performance could be observed across all cases.
Future Work:
The divisors operate on different indices of the array, allowing for parallelization and offering additional performance improvements in multi-threaded scenarios.
@mboehm7