Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[flash attn v2] Why V uses no-swizzle layout for registers? #1429

Open
phantaurus opened this issue Jan 8, 2025 · 1 comment
Open

[flash attn v2] Why V uses no-swizzle layout for registers? #1429

phantaurus opened this issue Jan 8, 2025 · 1 comment

Comments

@phantaurus
Copy link

Hello!

Could you help me understand why for V, we define two different layouts:
sVt and sVtNoSwizzle. The sVt is used for shared memory (tOsVt), and sVtNoSwizzle is used for register (tOrVt).
While for Q and K, we use the same swizzled layout for both shared memory and register.

Is it because that the QxK output register P is not (can not be) swizzled, so the register for V has to be no-swizzle to match the indices of the dimension that will be multiplied between P and V?

Thank you so much!

@tridao
Copy link
Contributor

tridao commented Jan 9, 2025

sVt is fine, we no longer need sVtNoSwizzle. It used to be needed in some old cutlass version

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants