Skip to content

Conversation

@pskiran1
Copy link
Member

@pskiran1 pskiran1 commented Dec 1, 2025

No description provided.

@pskiran1 pskiran1 requested a review from Copilot December 2, 2025 16:24
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances the detection mechanism for unresponsive or crashed Python backend stub processes by implementing proper process health checks on Unix-like systems.

Key Changes:

  • Replaces a simple PID check with waitpid() to actively detect terminated stub processes on non-Windows platforms
  • Adds a new TRITONBACKEND_ModelInstanceReady function to verify stub process health before execution
  • Ensures proper cleanup by guarding the kill operation with a PID check

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/stub_launcher.cc Implements waitpid() with WNOHANG to detect process termination and adds guard for kill operations
src/python_be.cc Adds health check function that validates stub process is both active and responsive

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +2427 to +2440
if (!instance_state->Stub()->StubActive()) {
return TRITONSERVER_ErrorNew(
TRITONSERVER_ERROR_INTERNAL, (std::string("Stub process '") +
instance_state->Name() + "' is not alive")
.c_str());
}

if (!instance_state->IsStubProcessAlive()) {
return TRITONSERVER_ErrorNew(
TRITONSERVER_ERROR_INTERNAL,
(std::string("Stub process '") + instance_state->Name() +
"' is not healthy (unresponsive).")
.c_str());
}
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two checks appear redundant—both StubActive() and IsStubProcessAlive() seem to verify process health. If they check different conditions, consider consolidating them into a single health check method or adding comments to clarify their distinct purposes.

Copilot uses AI. Check for mistakes.
Comment on lines +752 to +755
if (return_pid == -1) {
// If waitpid fails, it likely means the process no longer exists (ECHILD)
stub_pid_ = 0;
return false;
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When waitpid fails, consider logging the specific errno value (e.g., ECHILD, EINVAL) to aid debugging. The current comment mentions ECHILD but doesn't verify it, which could mask other failure modes.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants