You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Could you help me understand why for V, we define two different layouts:
sVt and sVtNoSwizzle. The sVt is used for shared memory (tOsVt), and sVtNoSwizzle is used for register (tOrVt).
While for Q and K, we use the same swizzled layout for both shared memory and register.
Is it because that the QxK output register P is not (can not be) swizzled, so the register for V has to be no-swizzle to match the indices of the dimension that will be multiplied between P and V?
Thank you so much!
The text was updated successfully, but these errors were encountered:
Hello!
Could you help me understand why for V, we define two different layouts:
sVt and sVtNoSwizzle. The sVt is used for shared memory (tOsVt), and sVtNoSwizzle is used for register (tOrVt).
While for Q and K, we use the same swizzled layout for both shared memory and register.
Is it because that the QxK output register P is not (can not be) swizzled, so the register for V has to be no-swizzle to match the indices of the dimension that will be multiplied between P and V?
Thank you so much!
The text was updated successfully, but these errors were encountered: