-
-
Notifications
You must be signed in to change notification settings - Fork 11.7k
[compile] Add fallback path to AOT compile when serialization fails. #27350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[compile] Add fallback path to AOT compile when serialization fails. #27350
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a fallback mechanism for AOT compilation when serialization fails, which is a good step towards improving robustness. The implementation correctly uses a try-except block to catch serialization errors and logs a warning. My review includes one suggestion to enhance the logging by including the full exception traceback. This will provide better diagnostics for debugging the underlying serialization issues, aiding in the development of a long-term solution.
| logger.warning( | ||
| "Cannot save aot compilation to path %s, error: %s", | ||
| aot_compilation_path, | ||
| str(e), | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For better debuggability, it's recommended to log the full traceback of the exception. This can be achieved by passing exc_info=True to the logger. It will help in diagnosing the root cause of serialization failures for the long-term fix. Also, you can pass the exception object e directly to the logger instead of str(e).
| logger.warning( | |
| "Cannot save aot compilation to path %s, error: %s", | |
| aot_compilation_path, | |
| str(e), | |
| ) | |
| logger.warning( | |
| "Cannot save aot compilation to path %s, error: %s", | |
| aot_compilation_path, | |
| e, | |
| exc_info=True, | |
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhxchen17 try-catch on general exception generally not good, we're gonna want to remove this in the medium term.
Summary: Fixing issue vllm-project#27348 For dynamo caching, it's possible that the compilation succeeds but the serialization step fails. In this case, the failure of serialization step shouldn't block user from getting compilation results correctly. Therefore we add a handling of the serialization error and only give warning when model saving fails. When saving fails, VLLM model runner should be able to just fallback to the old path, and in the next process, it will fail to load dynamo cache but still fallback to retracing with dynamo + loading inductor cache, which is the same behavior to AOT compile turned of off. This is mostly a short term fix and in the long term we should resolve the serialization bugs by eliminating pickling of graph modules. i.e. Once vllm-project#25205 is merged, we should be able to resolve the issue at a lower level. Test Plan: pytest tests/lora/test_quant_model.py Reviewers: Subscribers: Tasks: Tags: Signed-off-by: zhxchen17 <zhxchen17@fb.com>
b4c8be5 to
8dfa16f
Compare
…llm-project#27350) Signed-off-by: zhxchen17 <zhxchen17@fb.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Signed-off-by: Bhagyashri <Bhagyashri.Gaikwad2@ibm.com>
…llm-project#27350) Signed-off-by: zhxchen17 <zhxchen17@fb.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
…llm-project#27350) Signed-off-by: zhxchen17 <zhxchen17@fb.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
…llm-project#27350) Signed-off-by: zhxchen17 <zhxchen17@fb.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
…llm-project#27350) Signed-off-by: zhxchen17 <zhxchen17@fb.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Summary:
Fixing issue #27348
For dynamo caching, it's possible that the compilation succeeds but the serialization step fails. In this case, the failure of serialization step shouldn't block user from getting compilation results correctly.
Therefore we add a handling of the serialization error and only give warning when model saving fails. When saving fails, VLLM model runner should be able to just fallback to the old path, and in the next process, it will fail to load dynamo cache but still fallback to retracing with dynamo + loading inductor cache, which is the same behavior to AOT compile turned of off.
This is mostly a short term fix and in the long term we should resolve the serialization bugs by eliminating pickling of graph modules.
i.e. Once #25205 is merged, we should be able to resolve the issue at a lower level.
Test Plan:
pytest tests/lora/test_quant_model.py
Reviewers:
Subscribers:
Tasks:
Tags:
Purpose
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.