Add support of FP32 softmax to unified attention #577

afierka-intel · 2025-11-17T13:33:10Z

Some models from qwen2 or qwen2.5 family models require calculating attention with full fp32 precision to keep accurate results.
Adding support of FP32 qk matmuls in for unified attention operations.

Changes depends on: #571 and can be merged after.

Signed-off-by: Artur Fierka <artur.fierka@intel.com>

Copilot

Pull Request Overview

This PR adds support for FP32 precision softmax calculations in unified attention operations to maintain accuracy for certain Qwen2/Qwen2.5 family models. The change introduces conditional logic to perform QK matmul operations in FP32 when both use_output_tensor_in_matmulqk and fp32_softmax configuration flags are enabled.

Adds FP32 precision support for attention score calculation
Implements output tensor optimization for matmul operations when FP32 softmax is used
Updates three attention functions: partial_attn_causal, partial_attn_shared, and partial_attn_unique

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

vllm_gaudi/extension/unified.py

github-actions · 2025-11-17T13:35:23Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

This is a Draft PR. Please mark it as 'Ready for Review' to trigger the CI.

…ge (vllm-project#575) sampled_token_ids was changed from list[list[int]] to list[list[int]]: vllm-project/vllm#26368 Signed-off-by: Paweł Olejniczak <polejniczakx@habana.ai>

The change is to fix NIXL deployment procedure. --------- Signed-off-by: PatrykWo <patryk.wolsza@intel.com> Signed-off-by: Patryk Wolsza <patryk.wolsza@intel.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Iryna Boiko <iboiko@habana.ai>

…oject#571) From: vllm-project#188 --------- Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai> Signed-off-by: Agata Dobrzyniewicz <160237065+adobrzyn@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Michał Kuligowski <michal.kuligowski@intel.com>

Signed-off-by: Artur Fierka <artur.fierka@intel.com>

github-actions · 2025-11-18T11:35:57Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

This is a Draft PR. Please mark it as 'Ready for Review' to trigger the CI.

github-actions · 2025-11-18T11:38:01Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

This is a Draft PR. Please mark it as 'Ready for Review' to trigger the CI.

Signed-off-by: Artur Fierka <artur.fierka@intel.com>

github-actions · 2025-11-18T11:48:05Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

This is a Draft PR. Please mark it as 'Ready for Review' to trigger the CI.

github-actions · 2025-11-21T10:49:40Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

This is a Draft PR. Please mark it as 'Ready for Review' to trigger the CI.

github-actions · 2025-11-26T08:49:50Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

This is a Draft PR. Please mark it as 'Ready for Review' to trigger the CI.

github-actions · 2025-11-27T10:40:40Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

This is a Draft PR. Please mark it as 'Ready for Review' to trigger the CI.

github-actions · 2025-12-04T12:00:13Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

This is a Draft PR. Please mark it as 'Ready for Review' to trigger the CI.

Add support of FP32 softmax to unified attention

d662292

Signed-off-by: Artur Fierka <artur.fierka@intel.com>

Copilot AI review requested due to automatic review settings November 17, 2025 13:33

Copilot AI reviewed Nov 17, 2025

View reviewed changes

vllm_gaudi/extension/unified.py Outdated Show resolved Hide resolved

vllm_gaudi/extension/unified.py Outdated Show resolved Hide resolved

vllm_gaudi/extension/unified.py Outdated Show resolved Hide resolved

vllm_gaudi/extension/unified.py Outdated Show resolved Hide resolved

pawel-olejniczak and others added 5 commits November 18, 2025 13:35

[FIX_FOR_VLLM_LATEST] Fix crash after the sampled_token_ids type chan…

21e42f1

…ge (vllm-project#575) sampled_token_ids was changed from list[list[int]] to list[list[int]]: vllm-project/vllm#26368 Signed-off-by: Paweł Olejniczak <polejniczakx@habana.ai>

Rebasing and refactoring fp32 conditions

3a1757f

Signed-off-by: Artur Fierka <artur.fierka@intel.com>

Remove fp32 softmax from unsupported list for unified attention

bd28711

Signed-off-by: Artur Fierka <artur.fierka@intel.com>

Merge branch 'main' into dev/afierka/fp32-unified

321724d

Return accidentally deleted code line

3cb3cb3

Signed-off-by: Artur Fierka <artur.fierka@intel.com>

Merge branch 'main' into dev/afierka/fp32-unified

933300c

afierka-intel added 2 commits November 26, 2025 09:20

Merge branch 'main' into dev/afierka/fp32-unified

9953be2

Merge branch 'main' into dev/afierka/fp32-unified

20a6827

Merge branch 'main' into dev/afierka/fp32-unified

4e9226a

Merge branch 'main' into dev/afierka/fp32-unified

81c0b32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support of FP32 softmax to unified attention #577

Add support of FP32 softmax to unified attention #577

Uh oh!

afierka-intel commented Nov 17, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Nov 17, 2025

Uh oh!

github-actions bot commented Nov 18, 2025

Uh oh!

github-actions bot commented Nov 18, 2025

Uh oh!

github-actions bot commented Nov 18, 2025

Uh oh!

github-actions bot commented Nov 21, 2025

Uh oh!

github-actions bot commented Nov 26, 2025

Uh oh!

github-actions bot commented Nov 27, 2025

Uh oh!

github-actions bot commented Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add support of FP32 softmax to unified attention #577

Are you sure you want to change the base?

Add support of FP32 softmax to unified attention #577

Uh oh!

Conversation

afierka-intel commented Nov 17, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Nov 17, 2025

🚧 CI Blocked

Uh oh!

github-actions bot commented Nov 18, 2025

🚧 CI Blocked

Uh oh!

github-actions bot commented Nov 18, 2025

🚧 CI Blocked

Uh oh!

github-actions bot commented Nov 18, 2025

🚧 CI Blocked

Uh oh!

github-actions bot commented Nov 21, 2025

🚧 CI Blocked

Uh oh!

github-actions bot commented Nov 26, 2025

🚧 CI Blocked

Uh oh!

github-actions bot commented Nov 27, 2025

🚧 CI Blocked

Uh oh!

github-actions bot commented Dec 4, 2025

🚧 CI Blocked

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants