-
Notifications
You must be signed in to change notification settings - Fork 77
Implementing softmax_fa2 in partial_attn shared and causal #566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementing softmax_fa2 in partial_attn shared and causal #566
Conversation
Signed-off-by: Krzysztof Smusz <ksmusz@habana.ai>
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR replaces manual softmax computation with the optimized torch.ops.hpu.softmax_fa2 operation in two attention functions (partial_attn_causal and partial_attn_shared) for improved performance on HPU devices.
Key Changes:
- Replaced manual softmax implementation (max, exp, sum operations) with
torch.ops.hpu.softmax_fa2in both functions - Added initialization of
inputM_hpuandinputL_hputensors required by the new operation - Consolidated bias addition into the initial matmul operation in
partial_attn_shared
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Krzysztof Smusz <ksmusz@habana.ai>
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
Signed-off-by: Krzysztof Smusz <ksmusz@habana.ai>
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
Signed-off-by: Krzysztof Smusz <ksmusz@habana.ai>
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Krzysztof Smusz <ksmusz@habana.ai>
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
Signed-off-by: Krzysztof Smusz <ksmusz@habana.ai>
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
|
/run-gaudi-tests |
✅ CI PassedAll checks passed successfully against the following vllm commit: |
|
/run-gaudi-tests |
✅ CI PassedAll checks passed successfully against the following vllm commit: |
✅ CI PassedAll checks passed successfully against the following vllm commit: |
Signed-off-by: Krzysztof Smusz <ksmusz@habana.ai>
Signed-off-by: Krzysztof Smusz <ksmusz@habana.ai>
✅ CI PassedAll checks passed successfully against the following vllm commit: |
Signed-off-by: Krzysztof Smusz <ksmusz@habana.ai>
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
Signed-off-by: Krzysztof Smusz <ksmusz@habana.ai>
✅ CI PassedAll checks passed successfully against the following vllm commit: |
No description provided.