[Attention] FA2&FA3 support more head sizes, ViT support, make default backend #28763

MatthewBonanni · 2025-11-15T00:35:07Z

Purpose

This PR is paired with vllm-project/flash-attention#109 (merge that first after CI passes, then I'll update the git tag), which enables FA to support the head sizes required for vision transformers (40, 72, and 80). This PR also updates the selector to make FlashAttention the default backend over xFormers.

Test Plan

pytest tests/kernels/attention/test_flash_attn.py (updated with new head sizes)

Test Result

Passes

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

vllm/platforms/cuda.py

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

gemini-code-assist

Code Review

This pull request updates FlashAttention to support head sizes required for Vision Transformers (40, 72, 80). This is achieved by updating the dependency to a fork of flash-attention, generalizing the head size check in the FlashAttention backend, and updating tests. The logic for selecting the ViT attention backend is also refactored for clarity. My review has identified two main points. First, a critical issue in cmake/external_projects/vllm_flash_attn.cmake where the dependency points to a personal fork, which must be reverted before merging. Second, a high-severity issue in tests/kernels/attention/test_flash_attn.py where a test case for soft_cap has been removed, potentially hiding a feature regression. The other changes look good.

cmake/external_projects/vllm_flash_attn.cmake

tests/kernels/attention/test_flash_attn.py

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

LucasWilkinson · 2025-11-18T03:18:25Z

Do you know if FA2 is supported too? do you mine testing this on Ampere? I think it should be ok

MatthewBonanni · 2025-11-19T01:34:45Z

@LucasWilkinson pytest tests/kernels/attention/test_flash_attn.py passes on A100 with FA2 👍

…oject#28763) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: inkcherry <mingzhi.liu@amd.com>

…oject#28763) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

…oject#28763) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Bhagyashri <Bhagyashri.Gaikwad2@ibm.com>

…oject#28763) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: LuminolT <lumischen01@gmail.com>

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: jiang1.li <jiang1.li@intel.com>

…oject#28763) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

MatthewBonanni added 6 commits November 15, 2025 00:15

update head size support

76b70ce

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

update test

0001b99

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

update selector

6590244

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

fix capability

9e96e6f

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

temporary FA hash

1d35c46

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

current FA build doesn't support tanh softcapping

c5d997d

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

MatthewBonanni requested review from LucasWilkinson, WoosukKwon, mgoin, tlrmchlsmth and yewentao256 as code owners November 15, 2025 00:35

mergify bot added ci/build nvidia v1 labels Nov 15, 2025

github-project-automation bot added this to NVIDIA Nov 15, 2025

chatgpt-codex-connector bot reviewed Nov 15, 2025

View reviewed changes

vllm/platforms/cuda.py Outdated Show resolved Hide resolved

handle import error

0c871f4

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

gemini-code-assist bot reviewed Nov 15, 2025

View reviewed changes

cmake/external_projects/vllm_flash_attn.cmake Outdated Show resolved Hide resolved

tests/kernels/attention/test_flash_attn.py Show resolved Hide resolved

mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 15, 2025

MatthewBonanni added 2 commits November 15, 2025 10:39

Merge branch 'main' into fa_uneven_k

6df4948

update test

25c83d4

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

MatthewBonanni changed the title ~~[DO NOT MERGE][Attention] FlashAttention ViT support~~ [DO NOT MERGE][Attention] FlashAttention ViT support, make default backend Nov 17, 2025

update tag post-merge

c4edbac

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

MatthewBonanni changed the title ~~[DO NOT MERGE][Attention] FlashAttention ViT support, make default backend~~ [Attention] FlashAttention ViT support, make default backend Nov 17, 2025

mgoin mentioned this pull request Nov 17, 2025

[Bug]: Qwen 30ba3 VL Does not work #26989

Open

1 task

Merge branch 'main' into fa_uneven_k

edc6675

mgoin approved these changes Nov 19, 2025

View reviewed changes

github-project-automation bot moved this to In review in NVIDIA Nov 19, 2025

vllm-bot merged commit 4c23690 into vllm-project:main Nov 19, 2025
84 of 88 checks passed

github-project-automation bot moved this from In review to Done in NVIDIA Nov 19, 2025

inkcherry pushed a commit to inkcherry/vllm that referenced this pull request Nov 19, 2025

[Attention] FlashAttention ViT support, make default backend (vllm-pr…

4f05ea4

…oject#28763) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: inkcherry <mingzhi.liu@amd.com>

youkaichao changed the title ~~[Attention] FlashAttention ViT support, make default backend~~ [Attention] FA2&FA3 support more head sizes, ViT support, make default backend Nov 19, 2025

MatthewBonanni deleted the fa_uneven_k branch November 19, 2025 15:11

Victor49152 pushed a commit to Victor49152/vllm that referenced this pull request Nov 20, 2025

[Attention] FlashAttention ViT support, make default backend (vllm-pr…

3940a50

…oject#28763) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

LuminolT pushed a commit to LuminolT/vllm that referenced this pull request Nov 21, 2025

[Attention] FlashAttention ViT support, make default backend (vllm-pr…

820c146

…oject#28763) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: LuminolT <lumischen01@gmail.com>

ywang96 mentioned this pull request Nov 24, 2025

[Core] Deprecate xformers #29262

Merged

5 tasks

bigPYJ1151 pushed a commit that referenced this pull request Nov 25, 2025

[Attention] FlashAttention ViT support, make default backend (#28763)

c692b39

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: jiang1.li <jiang1.li@intel.com>

Victor49152 mentioned this pull request Nov 26, 2025

Remove upstream fa checks #29471

Open

5 tasks

bringlein pushed a commit to bringlein/vllm that referenced this pull request Nov 26, 2025

[Attention] FlashAttention ViT support, make default backend (vllm-pr…

fd54020

…oject#28763) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Attention] FA2&FA3 support more head sizes, ViT support, make default backend #28763

[Attention] FA2&FA3 support more head sizes, ViT support, make default backend #28763

MatthewBonanni commented Nov 15, 2025 •

edited by github-actions bot

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

LucasWilkinson commented Nov 18, 2025

Uh oh!

MatthewBonanni commented Nov 19, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

[Attention] FA2&FA3 support more head sizes, ViT support, make default backend #28763

[Attention] FA2&FA3 support more head sizes, ViT support, make default backend #28763

Conversation

MatthewBonanni commented Nov 15, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

LucasWilkinson commented Nov 18, 2025

Uh oh!

MatthewBonanni commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

MatthewBonanni commented Nov 15, 2025 •

edited by github-actions bot

Loading

MatthewBonanni commented Nov 19, 2025 •

edited

Loading