Skip to content
This repository has been archived by the owner on Nov 4, 2024. It is now read-only.

performance improvement for CPU implementations #97

Merged
merged 16 commits into from
Jul 23, 2024
Merged

performance improvement for CPU implementations #97

merged 16 commits into from
Jul 23, 2024

Conversation

avik-pal
Copy link
Member

@avik-pal avik-pal commented Jul 21, 2024

Main Improvements / Changes

  • Default DispatchDoctor mode is set to disable.
  • Bias activation -- LV / Strided don't seem to help. Manual broadcast.
  • Removes @fastmath
  • AMDGPU patch for broadcasting
  • Loop Ordering changed

Rolled back LoopVectorization for now, Enzyme seems really unhappy with it. Will add it back in with a later PR.

Before merge

  • remove unwanted deps
  • bump NNlib version

Base automatically changed from ap/patches to main July 21, 2024 21:21
@avik-pal avik-pal force-pushed the ap/loopvec branch 2 times, most recently from a517068 to 7a58bb0 Compare July 21, 2024 21:31
@avik-pal avik-pal force-pushed the ap/loopvec branch 3 times, most recently from 667110d to ec1d3cf Compare July 22, 2024 01:45
src/impl/bias_activation.jl Outdated Show resolved Hide resolved
@avik-pal avik-pal merged commit d179933 into main Jul 23, 2024
20 of 27 checks passed
@avik-pal avik-pal deleted the ap/loopvec branch July 23, 2024 02:03
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove special handling for swish and sigmoid_fast
1 participant