Skip to content

Conversation

@tianmu-li
Copy link
Contributor

When decode tokens are not strictly before prompt tokens, tokens from the previous batch cannot be copied using :num_decodes when using async scheduling.

… mixed

Signed-off-by: Tianmu Li <tianmu.li@intel.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR resolves an issue with async scheduling when decode and prompt tokens are mixed in a batch. The fix ensures that tokens from the previous batch can be correctly copied to their target positions when decode tokens are not strictly positioned before prompt tokens.

Key Changes:

  • Modified _prepare_input_ids to optionally return index tensor for reordered batches
  • Updated create_unified_batch to accept and use decode indices for correct token placement

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
vllm_gaudi/v1/worker/hpu_model_runner.py Added return_index parameter to _prepare_input_ids and logic to return index tensor when batch is reordered; updated batch preparation to pass decode index
vllm_gaudi/extension/unified_batch.py Added decode_index parameter and conditional logic to copy tokens using indices instead of assuming sequential placement

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: Tianmu Li <tianmu.li@intel.com>
@github-actions
Copy link

✅ CI Passed

All checks passed successfully against the following vllm commit:
0353d2e162cbda776d9dbfe026e65303204a7f1f

@adobrzyn adobrzyn merged commit 927dafa into vllm-project:main Dec 3, 2025
43 checks passed
tianmu-li added a commit to tianmu-li/vllm-gaudi that referenced this pull request Dec 3, 2025
… mixed (vllm-project#642)

When decode tokens are not strictly before prompt tokens, tokens from
the previous batch cannot be copied using :num_decodes when using async
scheduling.

---------

Signed-off-by: Tianmu Li <tianmu.li@intel.com>
mgawarkiewicz-intel pushed a commit that referenced this pull request Dec 4, 2025
… mixed (#678)

Cherrypick of #642

Signed-off-by: Tianmu Li <tianmu.li@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants