Commit 51c8f60
authored
[Bugfix] Resolve MTP > 1 issue when lm head tp > 1 (#4254)
### What this PR does / why we need it?
Previously, the dummy run executed compute_logits only once, regardless
of num_speculative_tokens. This caused execute_model to hang on
compute_logits when lm head tensor parallelism exceeded 1. The fix
ensures compute_logits executes correctly during dummy run, matching
num_speculative_tokens.
I set the `non_blocking` argument to False when moving
`exceeds_max_model_len` to the CPU. From what I understand, using
`non_blocking=True` and immediately accessing the tensor on the CPU can
cause accuracy problems. However, this issue doesn't happen when
transferring data to a device. ref:
https://discuss.pytorch.org/t/should-we-set-non-blocking-to-true/38234/18
- vLLM version: v0.11.0
- vLLM main:
vllm-project/vllm@2918c1b
---------
Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>1 parent e8e20c0 commit 51c8f60
File tree
5 files changed
+31
-19
lines changed- vllm_ascend
- spec_decode
- torchair
- worker
5 files changed
+31
-19
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
123 | 123 | | |
124 | 124 | | |
125 | 125 | | |
126 | | - | |
| 126 | + | |
| 127 | + | |
127 | 128 | | |
128 | 129 | | |
129 | 130 | | |
| |||
134 | 135 | | |
135 | 136 | | |
136 | 137 | | |
| 138 | + | |
137 | 139 | | |
138 | 140 | | |
139 | 141 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
213 | 213 | | |
214 | 214 | | |
215 | 215 | | |
216 | | - | |
| 216 | + | |
| 217 | + | |
217 | 218 | | |
218 | 219 | | |
219 | 220 | | |
| |||
296 | 297 | | |
297 | 298 | | |
298 | 299 | | |
| 300 | + | |
299 | 301 | | |
300 | 302 | | |
301 | 303 | | |
| |||
756 | 758 | | |
757 | 759 | | |
758 | 760 | | |
| 761 | + | |
759 | 762 | | |
760 | 763 | | |
761 | 764 | | |
| |||
821 | 824 | | |
822 | 825 | | |
823 | 826 | | |
824 | | - | |
| 827 | + | |
825 | 828 | | |
826 | 829 | | |
827 | 830 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
30 | | - | |
| 30 | + | |
| 31 | + | |
31 | 32 | | |
32 | 33 | | |
33 | 34 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
81 | 81 | | |
82 | 82 | | |
83 | 83 | | |
84 | | - | |
| 84 | + | |
| 85 | + | |
85 | 86 | | |
86 | 87 | | |
87 | 88 | | |
| |||
143 | 144 | | |
144 | 145 | | |
145 | 146 | | |
| 147 | + | |
146 | 148 | | |
147 | 149 | | |
148 | 150 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3003 | 3003 | | |
3004 | 3004 | | |
3005 | 3005 | | |
3006 | | - | |
3007 | | - | |
3008 | | - | |
3009 | | - | |
3010 | | - | |
3011 | | - | |
3012 | | - | |
3013 | | - | |
| 3006 | + | |
| 3007 | + | |
| 3008 | + | |
| 3009 | + | |
| 3010 | + | |
| 3011 | + | |
| 3012 | + | |
| 3013 | + | |
| 3014 | + | |
| 3015 | + | |
| 3016 | + | |
| 3017 | + | |
| 3018 | + | |
| 3019 | + | |
| 3020 | + | |
3014 | 3021 | | |
3015 | 3022 | | |
3016 | 3023 | | |
| |||
3032 | 3039 | | |
3033 | 3040 | | |
3034 | 3041 | | |
3035 | | - | |
3036 | | - | |
| 3042 | + | |
3037 | 3043 | | |
3038 | 3044 | | |
3039 | 3045 | | |
| |||
3042 | 3048 | | |
3043 | 3049 | | |
3044 | 3050 | | |
3045 | | - | |
3046 | | - | |
3047 | | - | |
3048 | | - | |
| 3051 | + | |
| 3052 | + | |
3049 | 3053 | | |
3050 | 3054 | | |
3051 | 3055 | | |
| |||
0 commit comments