-
Notifications
You must be signed in to change notification settings - Fork 187
fix: Enable detection of unresponsive or crashed Python backend stub process #423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix: Enable detection of unresponsive or crashed Python backend stub process #423
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR enhances the detection mechanism for unresponsive or crashed Python backend stub processes by implementing proper process health checks on Unix-like systems.
Key Changes:
- Replaces a simple PID check with
waitpid()to actively detect terminated stub processes on non-Windows platforms - Adds a new
TRITONBACKEND_ModelInstanceReadyfunction to verify stub process health before execution - Ensures proper cleanup by guarding the kill operation with a PID check
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/stub_launcher.cc | Implements waitpid() with WNOHANG to detect process termination and adds guard for kill operations |
| src/python_be.cc | Adds health check function that validates stub process is both active and responsive |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (!instance_state->Stub()->StubActive()) { | ||
| return TRITONSERVER_ErrorNew( | ||
| TRITONSERVER_ERROR_INTERNAL, (std::string("Stub process '") + | ||
| instance_state->Name() + "' is not alive") | ||
| .c_str()); | ||
| } | ||
|
|
||
| if (!instance_state->IsStubProcessAlive()) { | ||
| return TRITONSERVER_ErrorNew( | ||
| TRITONSERVER_ERROR_INTERNAL, | ||
| (std::string("Stub process '") + instance_state->Name() + | ||
| "' is not healthy (unresponsive).") | ||
| .c_str()); | ||
| } |
Copilot
AI
Dec 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two checks appear redundant—both StubActive() and IsStubProcessAlive() seem to verify process health. If they check different conditions, consider consolidating them into a single health check method or adding comments to clarify their distinct purposes.
| if (return_pid == -1) { | ||
| // If waitpid fails, it likely means the process no longer exists (ECHILD) | ||
| stub_pid_ = 0; | ||
| return false; |
Copilot
AI
Dec 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When waitpid fails, consider logging the specific errno value (e.g., ECHILD, EINVAL) to aid debugging. The current comment mentions ECHILD but doesn't verify it, which could mask other failure modes.
No description provided.