Skip to content

Conversation

@afierka-intel
Copy link
Collaborator

Some models from qwen2 or qwen2.5 family models require calculating attention with full fp32 precision to keep accurate results.
Adding support of FP32 qk matmuls in for unified attention operations.

Changes depends on: #571 and can be merged after.

Signed-off-by: Artur Fierka <artur.fierka@intel.com>
Copilot AI review requested due to automatic review settings November 17, 2025 13:33
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for FP32 precision softmax calculations in unified attention operations to maintain accuracy for certain Qwen2/Qwen2.5 family models. The change introduces conditional logic to perform QK matmul operations in FP32 when both use_output_tensor_in_matmulqk and fp32_softmax configuration flags are enabled.

  • Adds FP32 precision support for attention score calculation
  • Implements output tensor optimization for matmul operations when FP32 softmax is used
  • Updates three attention functions: partial_attn_causal, partial_attn_shared, and partial_attn_unique

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@github-actions
Copy link

🚧 CI Blocked

The main CI workflow was not started for the following reason:

This is a Draft PR. Please mark it as 'Ready for Review' to trigger the CI.

pawel-olejniczak and others added 5 commits November 18, 2025 13:35
…ge (vllm-project#575)

sampled_token_ids was changed from list[list[int]] to list[list[int]]:
vllm-project/vllm#26368

Signed-off-by: Paweł Olejniczak <polejniczakx@habana.ai>
The change is to fix NIXL deployment procedure.

---------

Signed-off-by: PatrykWo <patryk.wolsza@intel.com>
Signed-off-by: Patryk Wolsza <patryk.wolsza@intel.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Iryna Boiko <iboiko@habana.ai>
…oject#571)

From: vllm-project#188

---------

Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>
Signed-off-by: Agata Dobrzyniewicz <160237065+adobrzyn@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Michał Kuligowski <michal.kuligowski@intel.com>
Signed-off-by: Artur Fierka <artur.fierka@intel.com>
Signed-off-by: Artur Fierka <artur.fierka@intel.com>
@github-actions
Copy link

🚧 CI Blocked

The main CI workflow was not started for the following reason:

This is a Draft PR. Please mark it as 'Ready for Review' to trigger the CI.

@github-actions
Copy link

🚧 CI Blocked

The main CI workflow was not started for the following reason:

This is a Draft PR. Please mark it as 'Ready for Review' to trigger the CI.

Signed-off-by: Artur Fierka <artur.fierka@intel.com>
@github-actions
Copy link

🚧 CI Blocked

The main CI workflow was not started for the following reason:

This is a Draft PR. Please mark it as 'Ready for Review' to trigger the CI.

@github-actions
Copy link

🚧 CI Blocked

The main CI workflow was not started for the following reason:

This is a Draft PR. Please mark it as 'Ready for Review' to trigger the CI.

@github-actions
Copy link

🚧 CI Blocked

The main CI workflow was not started for the following reason:

This is a Draft PR. Please mark it as 'Ready for Review' to trigger the CI.

@github-actions
Copy link

🚧 CI Blocked

The main CI workflow was not started for the following reason:

This is a Draft PR. Please mark it as 'Ready for Review' to trigger the CI.

@github-actions
Copy link

github-actions bot commented Dec 4, 2025

🚧 CI Blocked

The main CI workflow was not started for the following reason:

This is a Draft PR. Please mark it as 'Ready for Review' to trigger the CI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants