Commit 9af3475
authored
[Bugfix] Fix model run _npu_flash_attention hang issue (#4410)
Fix model run _npu_flash_attention in _forward_prefill_no_cache hang
issue, it was caused by wrong attention mask dtype.
### How was this patch tested?
Yes, tesed on Qwen2.5-VL and Qwen2.5-Omni
- vLLM version: v0.11.0
- vLLM main:
vllm-project/vllm@2918c1b
Signed-off-by: Ting FU <futing10@huawei.com>1 parent 048d350 commit 9af3475
File tree
3 files changed
+6
-7
lines changed- tests/ut/attention
- vllm_ascend
- attention
- worker
3 files changed
+6
-7
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
74 | 74 | | |
75 | 75 | | |
76 | 76 | | |
77 | | - | |
78 | | - | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
79 | 80 | | |
80 | | - | |
| 81 | + | |
81 | 82 | | |
82 | 83 | | |
83 | 84 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
67 | 67 | | |
68 | 68 | | |
69 | 69 | | |
70 | | - | |
71 | | - | |
72 | 70 | | |
73 | 71 | | |
74 | 72 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
991 | 991 | | |
992 | 992 | | |
993 | 993 | | |
994 | | - | |
995 | | - | |
| 994 | + | |
| 995 | + | |
996 | 996 | | |
997 | 997 | | |
998 | 998 | | |
| |||
0 commit comments