Skip to content

Conversation

@qwes5s5
Copy link
Collaborator

@qwes5s5 qwes5s5 commented Dec 1, 2025

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick,PR标题需遵循格式,在最开始加上[Cherry-Pick]标签,以及最后面加上原PR ID,例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

在当前代码模式下,当在线推理服务中的某个请求断连时,其在FastDeploy中占用的资源和推理并没有立即中止释放,因此需要构建相应逻辑,来捕获请求中断,并释放相应资源停止推理。

Modifications

  • API 层面的统一引入

    • 装饰器定义与应用:
      • 在 utils 模块中,新增或增强 with_cancellation 装饰器,用于统一处理所有请求的中断逻辑(如监听 HTTP 断连)。
      • 在 API 层的各个对外接口(chat/completions/score...)中,统一使用 @with_cancellation 进行修饰。
  • 异常捕获与通知

    • 异常捕获:

      • 在 Handler 逻辑中,捕获由 @with_cancellation 抛出的 asyncio.CancelledError 异常。
    • zmq通信:

      • 在捕获异常后,通过 ZeroMQ (zmq) 消息队列将中止请求的相关信息发送给后端 Engine Service,通知其停止推理任务。
  • 引擎层中断处理 (EngineService)

    • 请求获取:

      • EngineService 在现有请求线程中修改接收逻辑,通过 ZMQ接收到中止请求消息。
    • 核心中止函数:

      • 新增 abort_requests 核心函数,专门负责处理接收到的中断请求,实现推理停止和资源回收。
  • 资源清理与调度 (abort_requests 详情)

    • 在 abort_requests 函数内部,将执行以下关键步骤来确保请求的彻底停止和资源回收:

      1. 构建结果: 构建最终的 RequestOutput 结果,并将其放入 scheduler 调度器的输出队列中。

      2. 任务清除: 清除 tasks_list 和 stop_flags 中与该请求对应的任务和标志位。

      3. 资源释放:

        • 使用资源管理器释放请求所占用的 KV Cache 块。
        • 从请求字典等数据结构中删除该请求的记录。

Usage or Command

Accuracy Tests

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link

paddle-bot bot commented Dec 1, 2025

Thanks for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant