Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flash-Attn upgrade / SoftCap Candle-FlashAttn [1/n] #2688

Conversation

michaelfeil
Copy link
Contributor

This is a series of PR's to pull in flash-attn-2 kernels and support:

  • pull in latest FA2:
    • launch of different grid size for sm86 / sm89
  • softcap FA2
  • unpadded_lse

Most likely I will implement this as a series of 3 stacked PRs. This is PR 1.

Closes #2687

@michaelfeil
Copy link
Contributor Author

   Compiling candle-flash-attn v0.8.1 (/workspace/model-performance/michaelfeil/candle/candle-flash-attn)
warning: non-local `impl` definition, `impl` blocks should be written at the same level as their item
  --> /workspace/model-performance/michaelfeil/candle/candle-core/src/sort.rs:90:9
   |
79 | /     fn cuda_fwd(
80 | |         &self,
81 | |         storage: &crate::CudaStorage,
82 | |         layout: &crate::Layout,
83 | |     ) -> Result<(crate::CudaStorage, crate::Shape)> {
   | |___________________________________________________- move the `impl` block outside of this method `cuda_fwd`
...
90 |           impl Map1Any for ArgSort {
   |           ^^^^^-------^^^^^-------
   |                |           |
   |                |           `ArgSort` is not local
   |                `Map1Any` is not local
   |
   = note: an `impl` is never scoped, even when it is nested inside an item, as it may impact type checking outside of that item, which can be the case if neither the trait or the self type are at the same nesting level as the `impl`
   = note: `#[warn(non_local_definitions)]` on by default

warning: `candle-core` (lib) generated 1 warning
    Finished `test` profile [unoptimized + debuginfo] target(s) in 8m 52s
     Running unittests src/lib.rs (target/debug/deps/candle_flash_attn-58541d909d490f26)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running tests/flash_attn_tests.rs (target/debug/deps/flash_attn_tests-a64401a027b99c85)

running 2 tests
test flash_attn_varlen ... ok
test flash_attn_acausal ... ok

test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 33.07s

   Doc-tests candle_flash_attn

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

@michaelfeil
Copy link
Contributor Author

Passing locally and ready for review.

   Compiling candle-flash-attn v0.8.1 (/workspace/model-performance/michaelfeil/candle/candle-flash-attn)
warning: non-local `impl` definition, `impl` blocks should be written at the same level as their item
  --> /workspace/model-performance/michaelfeil/candle/candle-core/src/sort.rs:90:9
   |
79 | /     fn cuda_fwd(
80 | |         &self,
81 | |         storage: &crate::CudaStorage,
82 | |         layout: &crate::Layout,
83 | |     ) -> Result<(crate::CudaStorage, crate::Shape)> {
   | |___________________________________________________- move the `impl` block outside of this method `cuda_fwd`
...
90 |           impl Map1Any for ArgSort {
   |           ^^^^^-------^^^^^-------
   |                |           |
   |                |           `ArgSort` is not local
   |                `Map1Any` is not local
   |
   = note: an `impl` is never scoped, even when it is nested inside an item, as it may impact type checking outside of that item, which can be the case if neither the trait or the self type are at the same nesting level as the `impl`
   = note: `#[warn(non_local_definitions)]` on by default

warning: `candle-core` (lib) generated 1 warning
    Finished `test` profile [unoptimized + debuginfo] target(s) in 9m 21s
     Running unittests src/lib.rs (target/debug/deps/candle_flash_attn-58541d909d490f26)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running tests/flash_attn_tests.rs (target/debug/deps/flash_attn_tests-a64401a027b99c85)

running 2 tests
test flash_attn_varlen ... ok
test flash_attn_acausal ... ok

test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 26.66s

   Doc-tests candle_flash_attn

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

@michaelfeil michaelfeil changed the title SoftCap Candle-FlashAttn [1/n] Flash-Attn upgrade / SoftCap Candle-FlashAttn [1/n] Dec 30, 2024
@LaurentMazare LaurentMazare merged commit 71cd6d5 into huggingface:main Dec 31, 2024
10 checks passed
@LaurentMazare
Copy link
Collaborator

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SoftCap in Flash-Attention 2
2 participants