[Feature] Support stopping the inference for the corresponding request in the online service after a disconnection request. #5320
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
在当前代码模式下,当在线推理服务中的某个请求断连时,其在FastDeploy中占用的资源和推理并没有立即中止释放,因此需要构建相应逻辑,来捕获请求中断,并释放相应资源停止推理。
Modifications
API 层面的统一引入
异常捕获与通知
异常捕获:
zmq通信:
引擎层中断处理 (EngineService)
请求获取:
核心中止函数:
资源清理与调度 (abort_requests 详情)
在 abort_requests 函数内部,将执行以下关键步骤来确保请求的彻底停止和资源回收:
构建结果: 构建最终的 RequestOutput 结果,并将其放入 scheduler 调度器的输出队列中。
任务清除: 清除 tasks_list 和 stop_flags 中与该请求对应的任务和标志位。
资源释放:
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.