[flash attn v2] Why V uses no-swizzle layout for registers? #1429

phantaurus · 2025-01-08T19:20:43Z

Hello!

Could you help me understand why for V, we define two different layouts:
sVt and sVtNoSwizzle. The sVt is used for shared memory (tOsVt), and sVtNoSwizzle is used for register (tOrVt).
While for Q and K, we use the same swizzled layout for both shared memory and register.

Is it because that the QxK output register P is not (can not be) swizzled, so the register for V has to be no-swizzle to match the indices of the dimension that will be multiplied between P and V?

Thank you so much!

tridao · 2025-01-09T09:46:20Z

sVt is fine, we no longer need sVtNoSwizzle. It used to be needed in some old cutlass version

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[flash attn v2] Why V uses no-swizzle layout for registers? #1429

[flash attn v2] Why V uses no-swizzle layout for registers? #1429

phantaurus commented Jan 8, 2025

tridao commented Jan 9, 2025

[flash attn v2] Why V uses no-swizzle layout for registers? #1429

[flash attn v2] Why V uses no-swizzle layout for registers? #1429

Comments

phantaurus commented Jan 8, 2025

tridao commented Jan 9, 2025