performance improvement for CPU implementations #97

avik-pal · 2024-07-21T16:54:13Z

Main Improvements / Changes

Default DispatchDoctor mode is set to disable.
Bias activation -- LV / Strided don't seem to help. Manual broadcast.
Removes @fastmath
AMDGPU patch for broadcasting
Loop Ordering changed

Rolled back LoopVectorization for now, Enzyme seems really unhappy with it. Will add it back in with a later PR.

Before merge

remove unwanted deps
bump NNlib version

src/impl/bias_activation.jl

avik-pal force-pushed the ap/loopvec branch from 3a626eb to dbfe52a Compare July 21, 2024 19:34

avik-pal mentioned this pull request Nov 3, 2024

Remove patches for working around NNlib issues LuxDL/Lux.jl#1010

Open

6 tasks

avik-pal linked an issue Jul 21, 2024 that may be closed by this pull request

Remove special handling for swish and sigmoid_fast #92

Closed

avik-pal force-pushed the ap/loopvec branch from dbfe52a to f1d71b8 Compare July 21, 2024 20:15

avik-pal force-pushed the ap/patches branch from 0ec4ecd to eb04237 Compare July 21, 2024 20:31

avik-pal added 5 commits July 21, 2024 14:21

test: more enzyme testing

73bd02b

refactor: set default dispatch doctor mode as disable

3418bb8

perf: optimize the performance of bias activation

d87eeb2

fix: remove @fastmath

9606f7b

refactor: remove AMDGPU patch for broadcasting

14cdfbc

Base automatically changed from ap/patches to main July 21, 2024 21:21

avik-pal force-pushed the ap/loopvec branch 2 times, most recently from a517068 to 7a58bb0 Compare July 21, 2024 21:31

fix: reorder loop iterations

8d68102

avik-pal force-pushed the ap/loopvec branch from 7a58bb0 to 8d68102 Compare July 21, 2024 21:40

feat: use sleefpirates for activation functions on CPU

a041e53

avik-pal force-pushed the ap/loopvec branch from d82377d to a041e53 Compare July 21, 2024 22:46

perf: reorder operations in GN loop

f2df920

avik-pal force-pushed the ap/loopvec branch 3 times, most recently from 667110d to ec1d3cf Compare July 22, 2024 01:45

revert: activations from SLEEFPirates

232b48e

avik-pal force-pushed the ap/loopvec branch from ec1d3cf to 232b48e Compare July 22, 2024 02:32

avik-pal added 4 commits July 21, 2024 19:50

feat: use loop vectorization for faster groupnorm

2471b61

feat: use loop vectorization for faster dropout

f868265

fix: dropout enzyme gradients

c958ead

refactor: move turbo into single function

ac05170

avik-pal force-pushed the ap/loopvec branch from c6aafd7 to ac05170 Compare July 22, 2024 06:02

fix: rollback loop vectorization for now

367d449

avik-pal mentioned this pull request Jul 22, 2024

Use LoopVectorization for faster Loops LuxDL/Lux.jl#1011

Open

16 tasks

chore: mark version for release on merge

c3052eb

avik-pal commented Jul 22, 2024

View reviewed changes

src/impl/bias_activation.jl Outdated Show resolved Hide resolved

fix: incorrect activation usage

8a49f3a

avik-pal force-pushed the ap/loopvec branch from 575cfa4 to 8a49f3a Compare July 23, 2024 00:36

avik-pal merged commit d179933 into main Jul 23, 2024
20 of 27 checks passed

avik-pal deleted the ap/loopvec branch July 23, 2024 02:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

performance improvement for CPU implementations #97

performance improvement for CPU implementations #97

avik-pal commented Jul 21, 2024 •

edited

Loading

performance improvement for CPU implementations #97

performance improvement for CPU implementations #97

Conversation

avik-pal commented Jul 21, 2024 • edited Loading

Main Improvements / Changes

Before merge

avik-pal commented Jul 21, 2024 •

edited

Loading