Commit 8dfa16f
committed
[compile] Add fallback path to AOT compile when serialization fails.
Summary:
Fixing issue vllm-project#27348
For dynamo caching, it's possible that the compilation succeeds but
the serialization step fails. In this case, the failure of serialization
step shouldn't block user from getting compilation results correctly.
Therefore we add a handling of the serialization error and only
give warning when model saving fails. When saving fails, VLLM model
runner should be able to just fallback to the old path, and in the
next process, it will fail to load dynamo cache but still fallback
to retracing with dynamo + loading inductor cache, which is the same
behavior to AOT compile turned of off.
This is mostly a short term fix and in the long term we should resolve
the serialization bugs by eliminating pickling of graph modules.
i.e. Once vllm-project#25205 is merged,
we should be able to resolve the issue at a lower level.
Test Plan:
pytest tests/lora/test_quant_model.py
Reviewers:
Subscribers:
Tasks:
Tags:
Signed-off-by: zhxchen17 <zhxchen17@fb.com>1 parent 3b96f85 commit 8dfa16f
1 file changed
+11
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
402 | 402 | | |
403 | 403 | | |
404 | 404 | | |
405 | | - | |
406 | | - | |
| 405 | + | |
| 406 | + | |
| 407 | + | |
| 408 | + | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
407 | 416 | | |
408 | 417 | | |
409 | 418 | | |
| |||
0 commit comments