Commit 23ba198
Spec decode warmup support (vllm-project#624)
GAUDISW-242931
Because currently spec decode flatten the spec decode tokens into
[batch_size * num_tokens, 1], we can warmup the decode shapes as it was.
The thing changed is the maximum batch_size we should warmup in the
configuration because the real batch size is batch_size * num_tokens
which is num_tokens (1 + num_speculative_tokens) times of original batch
size.
The thing to care in the warmup is the draft token (and block) space for
the proposing process in eagle. We need to leave out the
num_speculative_tokens space to use by propose for eagle.
Other care needs to be taken (already done in the PR of support
num_speculative_tokens > 1) is warmup will be run in compile only mode
without the real computation happening. So the operations for
prepare_attn_metadata in the drafter which depends on the real position
values must be done on CPU)
Another issue of handling no spec decode tokens for decode phase has
already been handled vllm-project#593
---------
Signed-off-by: Chen Haifeng <haifeng.chen@intel.com>1 parent efa7c83 commit 23ba198
File tree
2 files changed
+101
-11
lines changed- vllm_gaudi
- extension/bucketing
- v1/worker
2 files changed
+101
-11
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
39 | 39 | | |
40 | 40 | | |
41 | 41 | | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
42 | 45 | | |
43 | 46 | | |
44 | 47 | | |
45 | 48 | | |
46 | 49 | | |
47 | 50 | | |
48 | 51 | | |
49 | | - | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
50 | 59 | | |
51 | 60 | | |
52 | 61 | | |
53 | 62 | | |
54 | 63 | | |
55 | 64 | | |
| 65 | + | |
56 | 66 | | |
57 | 67 | | |
58 | 68 | | |
| |||
189 | 199 | | |
190 | 200 | | |
191 | 201 | | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
192 | 208 | | |
193 | 209 | | |
194 | 210 | | |
| |||
232 | 248 | | |
233 | 249 | | |
234 | 250 | | |
235 | | - | |
| 251 | + | |
236 | 252 | | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
237 | 259 | | |
238 | 260 | | |
239 | 261 | | |
| |||
260 | 282 | | |
261 | 283 | | |
262 | 284 | | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
263 | 325 | | |
264 | 326 | | |
265 | 327 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
891 | 891 | | |
892 | 892 | | |
893 | 893 | | |
| 894 | + | |
894 | 895 | | |
895 | 896 | | |
896 | 897 | | |
897 | 898 | | |
898 | | - | |
| 899 | + | |
| 900 | + | |
899 | 901 | | |
900 | 902 | | |
901 | 903 | | |
| |||
2031 | 2033 | | |
2032 | 2034 | | |
2033 | 2035 | | |
| 2036 | + | |
| 2037 | + | |
| 2038 | + | |
| 2039 | + | |
| 2040 | + | |
2034 | 2041 | | |
2035 | 2042 | | |
2036 | | - | |
| 2043 | + | |
2037 | 2044 | | |
2038 | 2045 | | |
2039 | 2046 | | |
2040 | 2047 | | |
2041 | | - | |
2042 | | - | |
2043 | 2048 | | |
2044 | 2049 | | |
2045 | 2050 | | |
| |||
3513 | 3518 | | |
3514 | 3519 | | |
3515 | 3520 | | |
3516 | | - | |
| 3521 | + | |
| 3522 | + | |
| 3523 | + | |
3517 | 3524 | | |
3518 | 3525 | | |
3519 | 3526 | | |
| |||
4098 | 4105 | | |
4099 | 4106 | | |
4100 | 4107 | | |
4101 | | - | |
| 4108 | + | |
| 4109 | + | |
| 4110 | + | |
| 4111 | + | |
| 4112 | + | |
| 4113 | + | |
| 4114 | + | |
| 4115 | + | |
| 4116 | + | |
| 4117 | + | |
4102 | 4118 | | |
4103 | 4119 | | |
4104 | 4120 | | |
| |||
4177 | 4193 | | |
4178 | 4194 | | |
4179 | 4195 | | |
4180 | | - | |
4181 | | - | |
| 4196 | + | |
4182 | 4197 | | |
4183 | 4198 | | |
4184 | 4199 | | |
4185 | 4200 | | |
4186 | 4201 | | |
4187 | | - | |
| 4202 | + | |
| 4203 | + | |
| 4204 | + | |
| 4205 | + | |
| 4206 | + | |
| 4207 | + | |
| 4208 | + | |
| 4209 | + | |
4188 | 4210 | | |
4189 | 4211 | | |
4190 | 4212 | | |
| |||
4324 | 4346 | | |
4325 | 4347 | | |
4326 | 4348 | | |
| 4349 | + | |
| 4350 | + | |
| 4351 | + | |
| 4352 | + | |
| 4353 | + | |
| 4354 | + | |
4327 | 4355 | | |
4328 | 4356 | | |
4329 | 4357 | | |
| |||
0 commit comments