feat: add Kserve readiness endpoint support: ServerLive / ServerReady / ModelReady #4708

GuanLuo · 2025-12-02T23:01:49Z

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

Summary by CodeRabbit

New Features
- Added gRPC health check endpoints for system monitoring: ServerLive, ServerReady, and ModelReady
- Enables verification of server liveness and overall system readiness state
- Supports per-model readiness checks to verify specific model availability
- Includes comprehensive test coverage and health check utilities for validation

_{✏️ Tip: You can customize this high-level summary in your review settings.}

… / ModelReady Signed-off-by: Guan Luo <gluo@nvidia.com>

Signed-off-by: Guan Luo <gluo@nvidia.com>

copy-pr-bot · 2025-12-02T23:01:53Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2025-12-02T23:05:31Z

Walkthrough

This PR introduces health check functionality to the gRPC inference service by adding three new RPC endpoints (ServerLive, ServerReady, ModelReady) at the protocol buffer level, implementing corresponding handlers in the Rust service, and providing test coverage with client utilities.

Changes

Cohort / File(s)	Summary
Proto Definitions `lib/llm/src/grpc/protos/kserve.proto`	Added three RPC methods to GRPCInferenceService: ServerLive, ServerReady, and ModelReady. Introduced six new message types (ServerLiveRequest, ServerLiveResponse, ServerReadyRequest, ServerReadyResponse, ModelReadyRequest, ModelReadyResponse) with documentation.
Service Implementation `lib/llm/src/grpc/service/kserve.rs`	Implemented three new async handler methods for the health check endpoints: `server_live()` returns live status, `server_ready()` checks if model manager has registered models, and `model_ready()` validates if a requested model is available.
Service Tests `lib/llm/tests/kserve_service.rs`	Added TestPort enum variant and comprehensive test cases (test_live_ready) validating server liveness, readiness states with and without registered models.
Client Utilities `tests/frontend/grpc/triton_echo_client.py`	Added `check_health()` function that creates a gRPC client, asserts server liveness, server readiness, and model readiness for the "echo" model.
Integration Tests `tests/frontend/grpc/test_tensor_mocker_engine.py`	Modified test_echo to invoke health check before inference operations.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Consistent patterns across three similar handler implementations with straightforward logic
Proto changes are declarative and follow existing conventions
Test additions use established patterns with clear intent
No complex state management or intricate control flow

Poem

🐰 Three checks now guard the server's gate,
Is it alive? Is it ready? Don't be late!
Models peek through with names so bright,
Health checks gleam through protocol light! ✨

Pre-merge checks

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is completely empty except for template placeholders; all required sections lack substantive content.	Fill in the Overview, Details, Where should the reviewer start, and Related Issues sections with specific information about the implementation changes.
Docstring Coverage	⚠️ Warning	Docstring coverage is 45.45% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main changes: adding Kserve readiness endpoint support with three new endpoints (ServerLive, ServerReady, ModelReady).

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

lib/llm/src/grpc/service/kserve.rs (1)

697-711: Consider validating the version parameter.

The model_ready implementation checks only the model name from the ModelReadyRequest but ignores the version field (Line 701). If version-specific readiness checks are not needed, this is fine. However, if multiple versions of a model could be registered, consider whether the version should be validated or documented as unused.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ca264be and a8f3b41.

📒 Files selected for processing (5)

lib/llm/src/grpc/protos/kserve.proto (2 hunks)
lib/llm/src/grpc/service/kserve.rs (1 hunks)
lib/llm/tests/kserve_service.rs (3 hunks)
tests/frontend/grpc/test_tensor_mocker_engine.py (1 hunks)
tests/frontend/grpc/triton_echo_client.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

lib/llm/src/grpc/service/kserve.rs (2)

lib/llm/src/discovery/model_manager.rs (1)

new (85-97)

lib/llm/src/local_model.rs (1)

card (351-353)

tests/frontend/grpc/test_tensor_mocker_engine.py (1)

tests/frontend/grpc/triton_echo_client.py (1)

check_health (10-19)

🪛 Ruff (0.14.7)

tests/frontend/grpc/triton_echo_client.py

14-14: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)

GitHub Check: clippy (launch/dynamo-run)
GitHub Check: clippy (lib/bindings/python)
GitHub Check: tests (lib/runtime/examples)
GitHub Check: clippy (.)
GitHub Check: Build and Test - dynamo
GitHub Check: tests (launch/dynamo-run)
GitHub Check: tests (lib/bindings/python)
GitHub Check: tests (.)

🔇 Additional comments (7)

tests/frontend/grpc/test_tensor_mocker_engine.py (1)

128-128: LGTM! Good addition of health check before inference.

The health check invocation before running inference ensures the service is ready, which aligns with the PR's goal of adding readiness endpoint support.

lib/llm/tests/kserve_service.rs (2)

8-22: LGTM! Well-organized port management for parallel tests.

The TestPort enum with the LiveReady variant at port 8998 maintains the existing pattern and prevents port conflicts during parallel test execution.

1977-2057: LGTM! Comprehensive test coverage for health check endpoints.

The test thoroughly validates:

Server liveness (always true)

Server readiness transitions (false → true when model is registered)

Model-specific readiness (false for unregistered, true for registered)

The test structure follows existing patterns and includes clear assertion messages.

lib/llm/src/grpc/service/kserve.rs (2)

679-685: LGTM! Appropriate liveness check implementation.

Returning true unconditionally is correct: if the server can respond to the RPC, it is by definition live. This aligns with the KServe specification where liveness indicates the process is running.

687-695: Server readiness relies on model registration only, not operational state.

The implementation returns ready: true when model cards exist, but does not verify whether the corresponding model engines are actually loaded or operational. This is by design—readiness indicates model registration rather than model availability for inference. Consider whether this semantics aligns with deployment expectations: if a model card is registered but the engine fails to load or crashes, the server will still report ready.

Alternatively, server_ready() could cross-reference get_model_cards() against loaded engines to ensure models are genuinely available for requests.

lib/llm/src/grpc/protos/kserve.proto (2)

19-38: LGTM! Well-documented RPC definitions.

The three new health check endpoints (ServerLive, ServerReady, ModelReady) are properly defined with clear documentation and follow the existing protobuf conventions in this file.

69-148: LGTM! Well-structured message definitions with comprehensive documentation.

The request and response message definitions follow protobuf best practices:

Empty request messages for ServerLive/ServerReady (common pattern)

ModelReadyRequest includes both name and optional version fields

Response messages use clear boolean fields with descriptive documentation

All messages include proper documentation comments

tests/frontend/grpc/triton_echo_client.py

Signed-off-by: Guan Luo <gluo@nvidia.com>

GuanLuo · 2025-12-03T01:00:01Z

/ok to test f4d3354

lib/llm/src/grpc/protos/kserve.proto

Co-authored-by: Ryan McCormick <rmccormick@nvidia.com> Signed-off-by: GuanLuo <41310872+GuanLuo@users.noreply.github.com>

lib/llm/src/grpc/protos/kserve.proto

Signed-off-by: GuanLuo <41310872+GuanLuo@users.noreply.github.com>

GuanLuo · 2025-12-03T18:16:11Z

/ok to test d2e5980

… / ModelReady (#4708) Signed-off-by: Guan Luo <gluo@nvidia.com> Signed-off-by: GuanLuo <41310872+GuanLuo@users.noreply.github.com> Co-authored-by: Ryan McCormick <rmccormick@nvidia.com> Signed-off-by: Guan Luo <gluo@nvidia.com>

…re commits (#4730) Signed-off-by: Guan Luo <gluo@nvidia.com> Signed-off-by: GuanLuo <41310872+GuanLuo@users.noreply.github.com> Co-authored-by: Ryan McCormick <rmccormick@nvidia.com> Co-authored-by: Tanmay Verma <tanmayv@nvidia.com>

GuanLuo added 2 commits December 2, 2025 14:35

feat: add Kserve readiness endpoint support: ServerLive / ServerReady…

bc02d5b

… / ModelReady Signed-off-by: Guan Luo <gluo@nvidia.com>

test: add readiness endpoint call in e2e testing

a8f3b41

Signed-off-by: Guan Luo <gluo@nvidia.com>

GuanLuo requested review from a team as code owners December 2, 2025 23:01

pull-request-size bot added the size/L label Dec 2, 2025

github-actions bot added the feat label Dec 2, 2025

coderabbitai bot reviewed Dec 2, 2025

View reviewed changes

tests/frontend/grpc/triton_echo_client.py Show resolved Hide resolved

chore: address comment

f4d3354

Signed-off-by: Guan Luo <gluo@nvidia.com>

rmccorm4 approved these changes Dec 3, 2025

View reviewed changes

lib/llm/src/grpc/protos/kserve.proto Outdated Show resolved Hide resolved

lib/llm/src/grpc/protos/kserve.proto Outdated Show resolved Hide resolved

lib/llm/src/grpc/protos/kserve.proto Outdated Show resolved Hide resolved

Apply suggestions from code review

df61c78

Co-authored-by: Ryan McCormick <rmccormick@nvidia.com> Signed-off-by: GuanLuo <41310872+GuanLuo@users.noreply.github.com>

GuanLuo enabled auto-merge (squash) December 3, 2025 09:37

GuanLuo commented Dec 3, 2025

View reviewed changes

lib/llm/src/grpc/protos/kserve.proto Outdated Show resolved Hide resolved

Apply suggestion from @GuanLuo

d2e5980

Signed-off-by: GuanLuo <41310872+GuanLuo@users.noreply.github.com>

copy-pr-bot bot had a problem deploying to GITLAB December 3, 2025 18:16 Failure

GuanLuo disabled auto-merge December 3, 2025 18:30

GuanLuo merged commit d90dc39 into main Dec 3, 2025
36 of 39 checks passed

GuanLuo deleted the gluo/dyn-1476-feat-add-kserve-serverready-live-endpoint branch December 3, 2025 19:17

GuanLuo mentioned this pull request Dec 3, 2025

chore: cherry-pick KServe readiness (#4708) and metrics (#4400) feature commits #4730

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add Kserve readiness endpoint support: ServerLive / ServerReady / ModelReady #4708

feat: add Kserve readiness endpoint support: ServerLive / ServerReady / ModelReady #4708

Uh oh!

GuanLuo commented Dec 2, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Dec 2, 2025

Uh oh!

coderabbitai bot commented Dec 2, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

GuanLuo commented Dec 3, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

GuanLuo commented Dec 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: add Kserve readiness endpoint support: ServerLive / ServerReady / ModelReady #4708

feat: add Kserve readiness endpoint support: ServerLive / ServerReady / ModelReady #4708

Uh oh!

Conversation

GuanLuo commented Dec 2, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Dec 2, 2025

Uh oh!

coderabbitai bot commented Dec 2, 2025

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

GuanLuo commented Dec 3, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

GuanLuo commented Dec 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

GuanLuo commented Dec 2, 2025 •

edited by coderabbitai bot

Loading