Skip to content

b4617

Compare
Choose a tag to compare
@github-actions github-actions released this 02 Feb 19:12
864a0b6
CUDA: use mma PTX instructions for FlashAttention (#11583)

* CUDA: use mma PTX instructions for FlashAttention

* __shfl_sync workaround for movmatrix

* add __shfl_sync to HIP

Co-authored-by: Diego Devesa <[email protected]>