[BugFix] Fix PP performance and PP kv connector output regression #28768

njhill · 2025-11-15T01:32:47Z

This fixes two pipeline-parallel related issues introduced by #26866:

Performance regression reported in [Bug]: Pipeline parallel doesn't really do the "parallel" among GPUs. #28270
KV connector output propagation/aggregation from workers

The first commit just reverts the granular tracing context managers added in #28329 which make the code very difficult to read/update. Please just look at the changes in the second commit which are smaller/cleaner.

Fixes #28270

Thanks to @weireweire for reporting.

gemini-code-assist

Code Review

This pull request introduces two main changes: a performance improvement by removing granular tracing context managers, and a fix for KV connector output propagation in pipeline parallel setups. The removal of record_function_or_nullcontext should improve performance and code readability. The refactoring of the pipeline parallelism logic in step_with_batch_queue to be more asynchronous is also a good improvement. However, I have a critical concern regarding the new mechanism for propagating kv_connector_output from non-final pipeline parallel ranks. The logic has been moved from execute_model to sample_tokens, but it seems sample_tokens is only executed on the final rank. This could lead to the loss of kv_connector_output from other ranks, breaking the feature. Please see my detailed comment.

vllm/v1/worker/gpu_model_runner.py

Signed-off-by: Nick Hill <nhill@redhat.com>

nvpohanh · 2025-11-17T01:03:41Z

@weireweire please check if this works. Thanks!

weireweire · 2025-11-17T01:49:12Z

Have you tested lm_eval，I also found PP have accuracy issue, not sure if it's related to #26866

weireweire · 2025-11-17T05:22:17Z

From my test, this fixed the overlap issue but the accuracy is still bad.

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |  0.2|±  |0.1333|
|     |       |strict-match    |     5|exact_match|↑  |  0.2|±  |0.1333|

My command:
vllm serve nvidia/DeepSeek-R1-0528-FP4-v2 --trust-remote-code --host 0.0.0.0 --port 8000 --pipeline-parallel-size 8 --tensor-parallel-size 1 --max-num-seqs 32 --max-cudagraph-capture-size 32 --max-model-len 4010 --max-num-batched-tokens 16000 --enable-chunked-prefill --kv-cache-dtype auto --gpu-memory-utilization 0.85 --no-enable-prefix-caching

lm_eval \
--model local-completions \
--tasks gsm8k \
--model_args base_url=http://0.0.0.0:8000/v1/completions,model=$MODEL,num_concurrent=$CONCURRENCY,timeout=6000,max_retries=1 \
--output_path "$LOG_DIR" \
--log_samples \
--limit 10 \

This result is same as my draft fix , must be somewhere else that caused the accuracy issue.

weireweire · 2025-11-17T07:28:58Z

accuracy issue tracked #28839

njhill · 2025-11-17T16:21:31Z

Thanks @weireweire, I think the accuracy issue is probably separate to this, we'll investigate that too.

WoosukKwon · 2025-11-17T21:49:32Z

vllm/v1/engine/core.py

            logger.info("Batch queue is enabled with size %d", self.batch_queue_size)
            self.batch_queue = deque(maxlen=self.batch_queue_size)

+        self.ec_producer = (


nit:

Suggested change

self.ec_producer = (

self.is_ec_producer = (

I'll open follow-on PR so that this doesn't hold up the release

@WoosukKwon opened follow-on #28884 for this.

…8768) Signed-off-by: Nick Hill <nhill@redhat.com> (cherry picked from commit 7765e5b)

…lm-project#28768) Signed-off-by: Nick Hill <nhill@redhat.com>

…8768) Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: jiang1.li <jiang1.li@intel.com>

…lm-project#28768) Signed-off-by: Nick Hill <nhill@redhat.com>

mergify bot added the v1 label Nov 15, 2025

gemini-code-assist bot reviewed Nov 15, 2025

View reviewed changes

vllm/v1/worker/gpu_model_runner.py Show resolved Hide resolved

njhill force-pushed the fix-broken-pp branch from bf6ecea to 9cde494 Compare November 15, 2025 01:36

njhill added the bug Something isn't working label Nov 15, 2025

njhill added this to the v0.11.1 milestone Nov 15, 2025

njhill added 2 commits November 14, 2025 20:42

revert tracing context manager changes to core.py

c1c005e

Signed-off-by: Nick Hill <nhill@redhat.com>

[BugFix] Fix PP performance and PP kv connector output regression

818983d

Signed-off-by: Nick Hill <nhill@redhat.com>

njhill force-pushed the fix-broken-pp branch from ec1339c to 818983d Compare November 15, 2025 04:43

njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 15, 2025

njhill added 3 commits November 15, 2025 08:26

fix ray executor

b468f09

Signed-off-by: Nick Hill <nhill@redhat.com>

Merge remote-tracking branch 'origin/main' into fix-broken-pp

fbd7aa3

Merge remote-tracking branch 'origin/main' into fix-broken-pp

74c97bc

This was referenced Nov 15, 2025

[Misc] Add more scoping for improved trace #28329

Merged

[Bug]: Pipeline parallel doesn't really do the "parallel" among GPUs. #28270

Closed

[BugFix]Fix the issue where there is no parallelism in PP mode #28286

Open

njhill and others added 2 commits November 15, 2025 10:22

fix ray fix

bf0392e

Signed-off-by: Nick Hill <nhill@redhat.com>

Merge branch 'main' into fix-broken-pp

52ceddc

WoosukKwon approved these changes Nov 17, 2025

View reviewed changes

njhill merged commit 7765e5b into vllm-project:main Nov 17, 2025
44 checks passed

njhill deleted the fix-broken-pp branch November 17, 2025 22:08

khluu pushed a commit that referenced this pull request Nov 17, 2025

[BugFix] Fix PP performance and PP kv connector output regression (#2…

0e023e9

…8768) Signed-off-by: Nick Hill <nhill@redhat.com> (cherry picked from commit 7765e5b)

This was referenced Nov 17, 2025

[Minor] Rename ec_producer field to is_ec_producer #28884

Merged

[BugFix] Fix PP/async scheduling with pooling models #28899

Merged

Victor49152 pushed a commit to Victor49152/vllm that referenced this pull request Nov 20, 2025

[BugFix] Fix PP performance and PP kv connector output regression (vl…

8e18c42

…lm-project#28768) Signed-off-by: Nick Hill <nhill@redhat.com>

bigPYJ1151 pushed a commit that referenced this pull request Nov 25, 2025

[BugFix] Fix PP performance and PP kv connector output regression (#2…

96ed367

…8768) Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: jiang1.li <jiang1.li@intel.com>

bringlein pushed a commit to bringlein/vllm that referenced this pull request Nov 26, 2025

[BugFix] Fix PP performance and PP kv connector output regression (vl…

8868143

…lm-project#28768) Signed-off-by: Nick Hill <nhill@redhat.com>

devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025

[BugFix] Fix PP performance and PP kv connector output regression (vl…

62334a0

…lm-project#28768) Signed-off-by: Nick Hill <nhill@redhat.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[BugFix] Fix PP performance and PP kv connector output regression #28768

[BugFix] Fix PP performance and PP kv connector output regression #28768

Uh oh!

njhill commented Nov 15, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

nvpohanh commented Nov 17, 2025

Uh oh!

weireweire commented Nov 17, 2025 •

edited

Loading

Uh oh!

weireweire commented Nov 17, 2025 •

edited

Loading

Uh oh!

weireweire commented Nov 17, 2025

Uh oh!

njhill commented Nov 17, 2025

Uh oh!

WoosukKwon Nov 17, 2025

Uh oh!

njhill Nov 17, 2025

Uh oh!

njhill Nov 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

[BugFix] Fix PP performance and PP kv connector output regression #28768

[BugFix] Fix PP performance and PP kv connector output regression #28768

Uh oh!

Conversation

njhill commented Nov 15, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

nvpohanh commented Nov 17, 2025

Uh oh!

weireweire commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

weireweire commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

weireweire commented Nov 17, 2025

Uh oh!

njhill commented Nov 17, 2025

Uh oh!

WoosukKwon Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

njhill Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

njhill Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

njhill commented Nov 15, 2025 •

edited by github-actions bot

Loading

weireweire commented Nov 17, 2025 •

edited

Loading

weireweire commented Nov 17, 2025 •

edited

Loading