-
Notifications
You must be signed in to change notification settings - Fork 660
[PD Disaggregation] Add timestamp for analyzing splitwise deployment #5317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR refactors the request timing metrics infrastructure to support comprehensive performance analysis of splitwise (prefill/decode disaggregation) deployments. The changes centralize timing attributes from the Request class into a dedicated RequestMetrics dataclass, adding numerous timestamp fields to track the full lifecycle of requests across prefill and decode nodes.
Key Changes
- Introduced a comprehensive
RequestMetricsdataclass with 20+ timestamp fields tracking request flow through the system - Migrated timing attributes from
Requestclass to the newmetricsobject - Added timestamp recording at critical points: scheduler receipt, resource allocation, prefill/decode handoff, and token generation
- Changed default value of
FD_ENABLE_CACHE_TASKfrom enabled to disabled
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| fastdeploy/engine/request.py | Added RequestMetrics dataclass with comprehensive timestamp fields and helper methods; removed timing attributes from Request class |
| fastdeploy/output/token_processor.py | Refactored to use metrics object instead of direct timing attributes; updated token processing to record timing via metrics methods |
| fastdeploy/engine/engine.py | Updated to record preprocess timing in metrics object |
| fastdeploy/engine/async_llm.py | Updated request preprocessing to use metrics.preprocess_start_time |
| fastdeploy/engine/common_engine.py | Added timestamp recording at key lifecycle points (engine_get_req_time, ask_decode_resource times, inference_start_time); updated comment to correctly indicate v0 scheduler usage |
| fastdeploy/engine/sched/resource_manager_v1.py | Removed direct inference_start_time assignments; added metrics propagation for decode scenarios |
| fastdeploy/engine/resource_manager.py | Removed direct inference_start_time assignment during resource allocation |
| fastdeploy/entrypoints/openai/protocol.py | Added metrics field to ChatCompletionStreamResponse |
| fastdeploy/entrypoints/openai/serving_chat.py | Updated to include metrics in streaming response chunks |
| fastdeploy/envs.py | Changed FD_ENABLE_CACHE_TASK default from "1" to "0" |
|
Thanks for your contribution! |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #5317 +/- ##
==========================================
Coverage ? 58.90%
==========================================
Files ? 324
Lines ? 40144
Branches ? 6062
==========================================
Hits ? 23648
Misses ? 14623
Partials ? 1873
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Motivation
增加时间戳,用于分析系统耗时。
Modifications
全流程打点。
Usage or Command
请求加上 collect_metrics=True 字段
Accuracy Tests
单侧覆盖
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.