Skip to content

Conversation

@zhxchen17
Copy link
Contributor

@zhxchen17 zhxchen17 commented Oct 22, 2025

Summary:
Fixing issue #27348

For dynamo caching, it's possible that the compilation succeeds but the serialization step fails. In this case, the failure of serialization step shouldn't block user from getting compilation results correctly.

Therefore we add a handling of the serialization error and only give warning when model saving fails. When saving fails, VLLM model runner should be able to just fallback to the old path, and in the next process, it will fail to load dynamo cache but still fallback to retracing with dynamo + loading inductor cache, which is the same behavior to AOT compile turned of off.

This is mostly a short term fix and in the long term we should resolve the serialization bugs by eliminating pickling of graph modules.

i.e. Once #25205 is merged, we should be able to resolve the issue at a lower level.

Test Plan:

pytest tests/lora/test_quant_model.py

Reviewers:

Subscribers:

Tasks:

Tags:

Purpose

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a fallback mechanism for AOT compilation when serialization fails, which is a good step towards improving robustness. The implementation correctly uses a try-except block to catch serialization errors and logs a warning. My review includes one suggestion to enhance the logging by including the full exception traceback. This will provide better diagnostics for debugging the underlying serialization issues, aiding in the development of a long-term solution.

Comment on lines +411 to +415
logger.warning(
"Cannot save aot compilation to path %s, error: %s",
aot_compilation_path,
str(e),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

For better debuggability, it's recommended to log the full traceback of the exception. This can be achieved by passing exc_info=True to the logger. It will help in diagnosing the root cause of serialization failures for the long-term fix. Also, you can pass the exception object e directly to the logger instead of str(e).

Suggested change
logger.warning(
"Cannot save aot compilation to path %s, error: %s",
aot_compilation_path,
str(e),
)
logger.warning(
"Cannot save aot compilation to path %s, error: %s",
aot_compilation_path,
e,
exc_info=True,
)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhxchen17 try-catch on general exception generally not good, we're gonna want to remove this in the medium term.

@zou3519 zou3519 added ready-for-merge Indicate this PR is ready to be merged by the maintainers, used by reviewers without merge access. ready ONLY add when PR is ready to merge/full CI is needed and removed ready-for-merge Indicate this PR is ready to be merged by the maintainers, used by reviewers without merge access. labels Oct 27, 2025
Summary:
Fixing issue vllm-project#27348

For dynamo caching, it's possible that the compilation succeeds but
the serialization step fails. In this case, the failure of serialization
step shouldn't block user from getting compilation results correctly.

Therefore we add a handling of the serialization error and only
give warning when model saving fails. When saving fails, VLLM model
runner should be able to just fallback to the old path, and in the
next process, it will fail to load dynamo cache but still fallback
to retracing with dynamo + loading inductor cache, which is the same
behavior to AOT compile turned of off.

This is mostly a short term fix and in the long term we should resolve
the serialization bugs by eliminating pickling of graph modules.

i.e. Once vllm-project#25205 is merged,
we should be able to resolve the issue at a lower level.

Test Plan:

pytest tests/lora/test_quant_model.py

Reviewers:

Subscribers:

Tasks:

Tags:

Signed-off-by: zhxchen17 <zhxchen17@fb.com>
@zhxchen17 zhxchen17 force-pushed the zhxchen17/precompile/serialization_fallback branch from b4c8be5 to 8dfa16f Compare October 27, 2025 15:26
@zou3519 zou3519 merged commit e3d8186 into vllm-project:main Oct 28, 2025
48 checks passed
bhagyashrigai pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Oct 29, 2025
…llm-project#27350)

Signed-off-by: zhxchen17 <zhxchen17@fb.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Signed-off-by: Bhagyashri <Bhagyashri.Gaikwad2@ibm.com>
ilmarkov pushed a commit to neuralmagic/vllm that referenced this pull request Nov 7, 2025
…llm-project#27350)

Signed-off-by: zhxchen17 <zhxchen17@fb.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
ZhengHongming888 pushed a commit to ZhengHongming888/vllm that referenced this pull request Nov 8, 2025
…llm-project#27350)

Signed-off-by: zhxchen17 <zhxchen17@fb.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025
…llm-project#27350)

Signed-off-by: zhxchen17 <zhxchen17@fb.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025
…llm-project#27350)

Signed-off-by: zhxchen17 <zhxchen17@fb.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants