-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FA3 regression on H100 80GB? #1432
Comments
Are you installing FA3 from |
Yes, I run |
Can you try this commit? |
Then I'm seeing this:
|
you'd need |
That took forever but it built successfully, thanks :) but the problem remains, performance is not great and I see |
Very strange, we don't ever use HGMMA.64x16x16. I just dumped the SASS and it's using |
Thanks for looking into this, I'll have a closer look at everything again and will get back to you at some point next week 👍 |
Hi I just updated my local installation of this repo to v2.7.2 and saw a signifcant drop in performance trying to run non-causal FA3 on H100 80GB. Most likely this is because suddenly the MMA instruction shape for the first HGMMAs changed to m64n16k16. I think it was m64n176k16 before. Was there some change in the heuristics or might that problem be on my side?
I can provide more details like specific environment, CUDA version, problem sizes, etc, as needed. Just wanted to raise awareness for now.
Thanks
Bastian
The text was updated successfully, but these errors were encountered: