-
Notifications
You must be signed in to change notification settings - Fork 721
feat: add Kserve readiness endpoint support: ServerLive / ServerReady / ModelReady #4708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add Kserve readiness endpoint support: ServerLive / ServerReady / ModelReady #4708
Conversation
… / ModelReady Signed-off-by: Guan Luo <gluo@nvidia.com>
Signed-off-by: Guan Luo <gluo@nvidia.com>
WalkthroughThis PR introduces health check functionality to the gRPC inference service by adding three new RPC endpoints (ServerLive, ServerReady, ModelReady) at the protocol buffer level, implementing corresponding handlers in the Rust service, and providing test coverage with client utilities. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes
Poem
Pre-merge checks❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
lib/llm/src/grpc/service/kserve.rs (1)
697-711: Consider validating the version parameter.The
model_readyimplementation checks only the model name from theModelReadyRequestbut ignores theversionfield (Line 701). If version-specific readiness checks are not needed, this is fine. However, if multiple versions of a model could be registered, consider whether the version should be validated or documented as unused.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
lib/llm/src/grpc/protos/kserve.proto(2 hunks)lib/llm/src/grpc/service/kserve.rs(1 hunks)lib/llm/tests/kserve_service.rs(3 hunks)tests/frontend/grpc/test_tensor_mocker_engine.py(1 hunks)tests/frontend/grpc/triton_echo_client.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
lib/llm/src/grpc/service/kserve.rs (2)
lib/llm/src/discovery/model_manager.rs (1)
new(85-97)lib/llm/src/local_model.rs (1)
card(351-353)
tests/frontend/grpc/test_tensor_mocker_engine.py (1)
tests/frontend/grpc/triton_echo_client.py (1)
check_health(10-19)
🪛 Ruff (0.14.7)
tests/frontend/grpc/triton_echo_client.py
14-14: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: clippy (launch/dynamo-run)
- GitHub Check: clippy (lib/bindings/python)
- GitHub Check: tests (lib/runtime/examples)
- GitHub Check: clippy (.)
- GitHub Check: Build and Test - dynamo
- GitHub Check: tests (launch/dynamo-run)
- GitHub Check: tests (lib/bindings/python)
- GitHub Check: tests (.)
🔇 Additional comments (7)
tests/frontend/grpc/test_tensor_mocker_engine.py (1)
128-128: LGTM! Good addition of health check before inference.The health check invocation before running inference ensures the service is ready, which aligns with the PR's goal of adding readiness endpoint support.
lib/llm/tests/kserve_service.rs (2)
8-22: LGTM! Well-organized port management for parallel tests.The TestPort enum with the LiveReady variant at port 8998 maintains the existing pattern and prevents port conflicts during parallel test execution.
1977-2057: LGTM! Comprehensive test coverage for health check endpoints.The test thoroughly validates:
- Server liveness (always true)
- Server readiness transitions (false → true when model is registered)
- Model-specific readiness (false for unregistered, true for registered)
The test structure follows existing patterns and includes clear assertion messages.
lib/llm/src/grpc/service/kserve.rs (2)
679-685: LGTM! Appropriate liveness check implementation.Returning
trueunconditionally is correct: if the server can respond to the RPC, it is by definition live. This aligns with the KServe specification where liveness indicates the process is running.
687-695: Server readiness relies on model registration only, not operational state.The implementation returns
ready: truewhen model cards exist, but does not verify whether the corresponding model engines are actually loaded or operational. This is by design—readiness indicates model registration rather than model availability for inference. Consider whether this semantics aligns with deployment expectations: if a model card is registered but the engine fails to load or crashes, the server will still report ready.Alternatively,
server_ready()could cross-referenceget_model_cards()against loaded engines to ensure models are genuinely available for requests.lib/llm/src/grpc/protos/kserve.proto (2)
19-38: LGTM! Well-documented RPC definitions.The three new health check endpoints (ServerLive, ServerReady, ModelReady) are properly defined with clear documentation and follow the existing protobuf conventions in this file.
69-148: LGTM! Well-structured message definitions with comprehensive documentation.The request and response message definitions follow protobuf best practices:
- Empty request messages for ServerLive/ServerReady (common pattern)
- ModelReadyRequest includes both name and optional version fields
- Response messages use clear boolean fields with descriptive documentation
- All messages include proper documentation comments
Signed-off-by: Guan Luo <gluo@nvidia.com>
|
/ok to test f4d3354 |
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com> Signed-off-by: GuanLuo <41310872+GuanLuo@users.noreply.github.com>
Signed-off-by: GuanLuo <41310872+GuanLuo@users.noreply.github.com>
|
/ok to test d2e5980 |
… / ModelReady (#4708) Signed-off-by: Guan Luo <gluo@nvidia.com> Signed-off-by: GuanLuo <41310872+GuanLuo@users.noreply.github.com> Co-authored-by: Ryan McCormick <rmccormick@nvidia.com> Signed-off-by: Guan Luo <gluo@nvidia.com>
…re commits (#4730) Signed-off-by: Guan Luo <gluo@nvidia.com> Signed-off-by: GuanLuo <41310872+GuanLuo@users.noreply.github.com> Co-authored-by: Ryan McCormick <rmccormick@nvidia.com> Co-authored-by: Tanmay Verma <tanmayv@nvidia.com>
Overview:
Details:
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.