[compile] Add fallback path to AOT compile when serialization fails. #27350

zhxchen17 · 2025-10-22T14:57:24Z

Summary:
Fixing issue #27348

For dynamo caching, it's possible that the compilation succeeds but the serialization step fails. In this case, the failure of serialization step shouldn't block user from getting compilation results correctly.

Therefore we add a handling of the serialization error and only give warning when model saving fails. When saving fails, VLLM model runner should be able to just fallback to the old path, and in the next process, it will fail to load dynamo cache but still fallback to retracing with dynamo + loading inductor cache, which is the same behavior to AOT compile turned of off.

This is mostly a short term fix and in the long term we should resolve the serialization bugs by eliminating pickling of graph modules.

i.e. Once #25205 is merged, we should be able to resolve the issue at a lower level.

Test Plan:

pytest tests/lora/test_quant_model.py

Reviewers:

Subscribers:

Tasks:

Tags:

Purpose

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request introduces a fallback mechanism for AOT compilation when serialization fails, which is a good step towards improving robustness. The implementation correctly uses a try-except block to catch serialization errors and logs a warning. My review includes one suggestion to enhance the logging by including the full exception traceback. This will provide better diagnostics for debugging the underlying serialization issues, aiding in the development of a long-term solution.

gemini-code-assist · 2025-10-22T14:58:40Z

vllm/compilation/decorators.py

+                        logger.warning(
+                            "Cannot save aot compilation to path %s, error: %s",
+                            aot_compilation_path,
+                            str(e),
+                        )


For better debuggability, it's recommended to log the full traceback of the exception. This can be achieved by passing exc_info=True to the logger. It will help in diagnosing the root cause of serialization failures for the long-term fix. Also, you can pass the exception object e directly to the logger instead of str(e).

Suggested change

logger.warning(

"Cannot save aot compilation to path %s, error: %s",

aot_compilation_path,

str(e),

)

logger.warning(

"Cannot save aot compilation to path %s, error: %s",

aot_compilation_path,

e,

exc_info=True,

)

@zhxchen17 try-catch on general exception generally not good, we're gonna want to remove this in the medium term.

Summary: Fixing issue vllm-project#27348 For dynamo caching, it's possible that the compilation succeeds but the serialization step fails. In this case, the failure of serialization step shouldn't block user from getting compilation results correctly. Therefore we add a handling of the serialization error and only give warning when model saving fails. When saving fails, VLLM model runner should be able to just fallback to the old path, and in the next process, it will fail to load dynamo cache but still fallback to retracing with dynamo + loading inductor cache, which is the same behavior to AOT compile turned of off. This is mostly a short term fix and in the long term we should resolve the serialization bugs by eliminating pickling of graph modules. i.e. Once vllm-project#25205 is merged, we should be able to resolve the issue at a lower level. Test Plan: pytest tests/lora/test_quant_model.py Reviewers: Subscribers: Tasks: Tags: Signed-off-by: zhxchen17 <zhxchen17@fb.com>

…llm-project#27350) Signed-off-by: zhxchen17 <zhxchen17@fb.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Signed-off-by: Bhagyashri <Bhagyashri.Gaikwad2@ibm.com>

…llm-project#27350) Signed-off-by: zhxchen17 <zhxchen17@fb.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>

zhxchen17 requested review from ProExpertProg, youkaichao and zou3519 as code owners October 22, 2025 14:57

gemini-code-assist bot reviewed Oct 22, 2025

View reviewed changes

zhxchen17 mentioned this pull request Oct 22, 2025

AOT Compilation for torch.compile (Bundled) #24274

Merged

5 tasks

zou3519 approved these changes Oct 27, 2025

View reviewed changes

zhxchen17 force-pushed the zhxchen17/precompile/serialization_fallback branch from b4c8be5 to 8dfa16f Compare October 27, 2025 15:26

Merge branch 'main' into zhxchen17/precompile/serialization_fallback

f67f5d2

zou3519 merged commit e3d8186 into vllm-project:main Oct 28, 2025
48 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[compile] Add fallback path to AOT compile when serialization fails. #27350

[compile] Add fallback path to AOT compile when serialization fails. #27350

Uh oh!

zhxchen17 commented Oct 22, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 22, 2025

Uh oh!

zou3519 Oct 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[compile] Add fallback path to AOT compile when serialization fails. #27350

[compile] Add fallback path to AOT compile when serialization fails. #27350

Uh oh!

Conversation

zhxchen17 commented Oct 22, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

zou3519 Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zhxchen17 commented Oct 22, 2025 •

edited by github-actions bot

Loading