[compile] Add fallback path to AOT compile when serialization fails.

zhxchen17 · zhxchen17 · commit b4c8be5b259f · 2025-10-22T07:49:20.000-07:00
Summary: Fixing issue #27348 For dynamo caching, it's possible that the compilation succeeds but the serialization step fails. In this case, the failure of serialization step shouldn't block user from getting compilation results correctly. Therefore we add a handling of the serialization error and only give warning when model saving fails. When saving fails, VLLM model runner should be able to just fallback to the old path, and in the next process, it will fail to load dynamo cache but still fallback to retracing with dynamo + loading inductor cache, which is the same behavior to AOT compile turned of off. This is mostly a short term fix and in the long term we should resolve the serialization bugs by eliminating pickling of graph modules. i.e. Once #25205 is merged, we should be able to resolve the issue at a lower level. Test Plan: pytest tests/lora/test_quant_model.py Reviewers: Subscribers: Tasks: Tags: Signed-off-by: zhxchen17 <zhxchen17@fb.com>
diff --git a/vllm/compilation/decorators.py b/vllm/compilation/decorators.py
@@ -402,8 +402,17 @@ def patched_inline_call(self_):
                     output = self.aot_compiled_fn(self, *args, **kwargs)
                     assert aot_compilation_path is not None
                     assert cache_dir is not None
-                    os.makedirs(cache_dir, exist_ok=True)
-                    self.aot_compiled_fn.save_compiled_function(aot_compilation_path)
+                    try:
+                        os.makedirs(cache_dir, exist_ok=True)
+                        self.aot_compiled_fn.save_compiled_function(
+                            aot_compilation_path
+                        )
+                    except Exception as e:
+                        logger.warning(
+                            "Cannot save aot compilation to path %s, error: %s",
+                            aot_compilation_path,
+                            str(e),
+                        )
                 else:
                     output = self.compiled_callable(*args, **kwargs)
             return output