Dao-AILab / flash-attention Public

Notifications You must be signed in to change notification settings
Fork 1.4k
Star 15k

Code
Issues 632
Pull requests 48
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Issues: Dao-AILab/flash-attention

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

632 Open 562 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Running flash_attn/flash_attn_triton_amd/bench.py with sequence length > 4096 causes RuntimeError: Triton Error [CUDA]: an illegal memory access was encountered

#1440 opened Jan 12, 2025 by jiqimaoke

IncompatibleTypeErrorImpl('invalid operands of type pointer<int64> and triton.language.int32')

#1439 opened Jan 11, 2025 by wuyouliaoxi

FA3 forward performance regression on H200

#1438 opened Jan 10, 2025 by complexfilter

FA3 does not work with torch.compile

#1435 opened Jan 10, 2025 by nighting0le01

FA3 regression on H100 80GB?

#1432 opened Jan 9, 2025 by bastianhagedorn

[flash attn v2] Why V uses no-swizzle layout for registers?

#1429 opened Jan 8, 2025 by phantaurus

version `GLIBCXX_3.4.29' not found

#1428 opened Jan 8, 2025 by zhanghanxing2022

ERROR: No matching distribution found for flash-attn==2.6.3+cu123torch2.4cxx11abifalse

#1423 opened Jan 6, 2025 by carolynsoo

Unable to install flash_attn on H100 with CUDA 12.5

#1422 opened Jan 6, 2025 by ghadiaravi13

Unable to install flash-attn even if I first install torch alone

#1421 opened Jan 3, 2025 by ytxmobile98

Is there a plan to support flash_attn_varlen_backward with fp8

#1420 opened Jan 3, 2025 by gaodaheng

Encounter some problems when building wheel

#1418 opened Jan 2, 2025 by ZarkPanda

flash_attn_with_kvcache discrepancy slicing kv_cache / cache_seqlens

#1417 opened Jan 1, 2025 by jeromeku

RuntimeError: Error compiling objects for extension

#1415 opened Dec 27, 2024 by ProgramerSalar

looking for a test to verify cache correctness in flash_attn_with_kvcache

#1414 opened Dec 26, 2024 by chakpongchung

Performance Impact of Using Three Warps per Group (WG) in FA3 Compared to Two WGs

#1413 opened Dec 24, 2024 by ziyuhuang123

UnboundLocalError: local variable 'out' referenced before assignment

#1412 opened Dec 24, 2024 by chuangzhidan

Can't intall it

#1411 opened Dec 24, 2024 by TherrenceF

Impact of Register Spills on FA3 Kernel Performance

#1410 opened Dec 24, 2024 by ziyuhuang123

FA 2.4.2 is falling unitest on A6000 and A5880

#1409 opened Dec 23, 2024 by BoxiangW

Why Does FA3 Use Registers Instead of Directly Accessing SMEM with WGMMA on SM90?

#1407 opened Dec 23, 2024 by ziyuhuang123

4 Failing test_flash_attn_output_fp8 tests on H100

#1404 opened Dec 20, 2024 by BioGeek

Understanding sync and arrive in FA3 Store Function

#1401 opened Dec 19, 2024 by ziyuhuang123

Understanding the Role of arrive in NamedBarrier Synchronization

#1400 opened Dec 19, 2024 by ziyuhuang123

The execution order between GEMM0 of the next iteration and GEMM1 of the current iteration in Pingpong scheduling pipeline for overlapping gemms and softmax between warpgroups

#1398 opened Dec 19, 2024 by tengdecheng

Previous 1 2 3 4 5 … 25 26 Next

Previous Next

ProTip! Mix and match filters to narrow down what you’re looking for.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly