Skip to content

Conversation

@GuanLuo
Copy link
Contributor

@GuanLuo GuanLuo commented Dec 2, 2025

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • New Features
    • Added gRPC health check endpoints for system monitoring: ServerLive, ServerReady, and ModelReady
    • Enables verification of server liveness and overall system readiness state
    • Supports per-model readiness checks to verify specific model availability
    • Includes comprehensive test coverage and health check utilities for validation

✏️ Tip: You can customize this high-level summary in your review settings.

… / ModelReady

Signed-off-by: Guan Luo <gluo@nvidia.com>
Signed-off-by: Guan Luo <gluo@nvidia.com>
@GuanLuo GuanLuo requested review from a team as code owners December 2, 2025 23:01
@copy-pr-bot
Copy link

copy-pr-bot bot commented Dec 2, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 2, 2025

Walkthrough

This PR introduces health check functionality to the gRPC inference service by adding three new RPC endpoints (ServerLive, ServerReady, ModelReady) at the protocol buffer level, implementing corresponding handlers in the Rust service, and providing test coverage with client utilities.

Changes

Cohort / File(s) Summary
Proto Definitions
lib/llm/src/grpc/protos/kserve.proto
Added three RPC methods to GRPCInferenceService: ServerLive, ServerReady, and ModelReady. Introduced six new message types (ServerLiveRequest, ServerLiveResponse, ServerReadyRequest, ServerReadyResponse, ModelReadyRequest, ModelReadyResponse) with documentation.
Service Implementation
lib/llm/src/grpc/service/kserve.rs
Implemented three new async handler methods for the health check endpoints: server_live() returns live status, server_ready() checks if model manager has registered models, and model_ready() validates if a requested model is available.
Service Tests
lib/llm/tests/kserve_service.rs
Added TestPort enum variant and comprehensive test cases (test_live_ready) validating server liveness, readiness states with and without registered models.
Client Utilities
tests/frontend/grpc/triton_echo_client.py
Added check_health() function that creates a gRPC client, asserts server liveness, server readiness, and model readiness for the "echo" model.
Integration Tests
tests/frontend/grpc/test_tensor_mocker_engine.py
Modified test_echo to invoke health check before inference operations.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

  • Consistent patterns across three similar handler implementations with straightforward logic
  • Proto changes are declarative and follow existing conventions
  • Test additions use established patterns with clear intent
  • No complex state management or intricate control flow

Poem

🐰 Three checks now guard the server's gate,
Is it alive? Is it ready? Don't be late!
Models peek through with names so bright,
Health checks gleam through protocol light! ✨

Pre-merge checks

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is completely empty except for template placeholders; all required sections lack substantive content. Fill in the Overview, Details, Where should the reviewer start, and Related Issues sections with specific information about the implementation changes.
Docstring Coverage ⚠️ Warning Docstring coverage is 45.45% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main changes: adding Kserve readiness endpoint support with three new endpoints (ServerLive, ServerReady, ModelReady).

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
lib/llm/src/grpc/service/kserve.rs (1)

697-711: Consider validating the version parameter.

The model_ready implementation checks only the model name from the ModelReadyRequest but ignores the version field (Line 701). If version-specific readiness checks are not needed, this is fine. However, if multiple versions of a model could be registered, consider whether the version should be validated or documented as unused.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ca264be and a8f3b41.

📒 Files selected for processing (5)
  • lib/llm/src/grpc/protos/kserve.proto (2 hunks)
  • lib/llm/src/grpc/service/kserve.rs (1 hunks)
  • lib/llm/tests/kserve_service.rs (3 hunks)
  • tests/frontend/grpc/test_tensor_mocker_engine.py (1 hunks)
  • tests/frontend/grpc/triton_echo_client.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
lib/llm/src/grpc/service/kserve.rs (2)
lib/llm/src/discovery/model_manager.rs (1)
  • new (85-97)
lib/llm/src/local_model.rs (1)
  • card (351-353)
tests/frontend/grpc/test_tensor_mocker_engine.py (1)
tests/frontend/grpc/triton_echo_client.py (1)
  • check_health (10-19)
🪛 Ruff (0.14.7)
tests/frontend/grpc/triton_echo_client.py

14-14: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: clippy (launch/dynamo-run)
  • GitHub Check: clippy (lib/bindings/python)
  • GitHub Check: tests (lib/runtime/examples)
  • GitHub Check: clippy (.)
  • GitHub Check: Build and Test - dynamo
  • GitHub Check: tests (launch/dynamo-run)
  • GitHub Check: tests (lib/bindings/python)
  • GitHub Check: tests (.)
🔇 Additional comments (7)
tests/frontend/grpc/test_tensor_mocker_engine.py (1)

128-128: LGTM! Good addition of health check before inference.

The health check invocation before running inference ensures the service is ready, which aligns with the PR's goal of adding readiness endpoint support.

lib/llm/tests/kserve_service.rs (2)

8-22: LGTM! Well-organized port management for parallel tests.

The TestPort enum with the LiveReady variant at port 8998 maintains the existing pattern and prevents port conflicts during parallel test execution.


1977-2057: LGTM! Comprehensive test coverage for health check endpoints.

The test thoroughly validates:

  • Server liveness (always true)
  • Server readiness transitions (false → true when model is registered)
  • Model-specific readiness (false for unregistered, true for registered)

The test structure follows existing patterns and includes clear assertion messages.

lib/llm/src/grpc/service/kserve.rs (2)

679-685: LGTM! Appropriate liveness check implementation.

Returning true unconditionally is correct: if the server can respond to the RPC, it is by definition live. This aligns with the KServe specification where liveness indicates the process is running.


687-695: Server readiness relies on model registration only, not operational state.

The implementation returns ready: true when model cards exist, but does not verify whether the corresponding model engines are actually loaded or operational. This is by design—readiness indicates model registration rather than model availability for inference. Consider whether this semantics aligns with deployment expectations: if a model card is registered but the engine fails to load or crashes, the server will still report ready.

Alternatively, server_ready() could cross-reference get_model_cards() against loaded engines to ensure models are genuinely available for requests.

lib/llm/src/grpc/protos/kserve.proto (2)

19-38: LGTM! Well-documented RPC definitions.

The three new health check endpoints (ServerLive, ServerReady, ModelReady) are properly defined with clear documentation and follow the existing protobuf conventions in this file.


69-148: LGTM! Well-structured message definitions with comprehensive documentation.

The request and response message definitions follow protobuf best practices:

  • Empty request messages for ServerLive/ServerReady (common pattern)
  • ModelReadyRequest includes both name and optional version fields
  • Response messages use clear boolean fields with descriptive documentation
  • All messages include proper documentation comments

Signed-off-by: Guan Luo <gluo@nvidia.com>
@GuanLuo
Copy link
Contributor Author

GuanLuo commented Dec 3, 2025

/ok to test f4d3354

Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
Signed-off-by: GuanLuo <41310872+GuanLuo@users.noreply.github.com>
@GuanLuo GuanLuo enabled auto-merge (squash) December 3, 2025 09:37
Signed-off-by: GuanLuo <41310872+GuanLuo@users.noreply.github.com>
@GuanLuo
Copy link
Contributor Author

GuanLuo commented Dec 3, 2025

/ok to test d2e5980

@GuanLuo GuanLuo disabled auto-merge December 3, 2025 18:30
@GuanLuo GuanLuo merged commit d90dc39 into main Dec 3, 2025
36 of 39 checks passed
@GuanLuo GuanLuo deleted the gluo/dyn-1476-feat-add-kserve-serverready-live-endpoint branch December 3, 2025 19:17
GuanLuo added a commit that referenced this pull request Dec 3, 2025
… / ModelReady (#4708)

Signed-off-by: Guan Luo <gluo@nvidia.com>
Signed-off-by: GuanLuo <41310872+GuanLuo@users.noreply.github.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
Signed-off-by: Guan Luo <gluo@nvidia.com>
nv-tusharma pushed a commit that referenced this pull request Dec 3, 2025
…re commits (#4730)

Signed-off-by: Guan Luo <gluo@nvidia.com>
Signed-off-by: GuanLuo <41310872+GuanLuo@users.noreply.github.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
Co-authored-by: Tanmay Verma <tanmayv@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants