Skip to content

Conversation

@wucong25
Copy link

@wucong25 wucong25 commented Dec 1, 2025

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request attempts to prevent a crash when initialize_kv_cache is called multiple times by adding a guard around the initialize_attn_backend call. While this fixes the immediate crash, the fix is incomplete and introduces a risk of state corruption because other parts of the method are not idempotent. My review includes a suggestion for a more robust fix that makes the entire method idempotent, ensuring correct behavior on subsequent calls.

Comment on lines 2697 to +2701
self.kv_cache_config = kv_cache_config
self.may_add_encoder_only_layers_to_kv_cache_config()
# NOTE(cmq): initialize_attn_backend must before using self.attn_groups
self.initialize_attn_backend(kv_cache_config)
if not self.attn_groups:
self.initialize_attn_backend(kv_cache_config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This change prevents a crash if initialize_kv_cache is called multiple times, but it's an incomplete fix and can lead to subtle bugs. The function may_add_encoder_only_layers_to_kv_cache_config() is still called unconditionally on each invocation. This function appends to self.kv_cache_config.kv_cache_groups, which will result in duplicate entries if initialize_kv_cache is called more than once. Subsequent parts of this method, like may_reinitialize_input_batch and initialize_kv_cache_tensors, will then operate with an inconsistent state (a modified kv_cache_config but stale attn_groups), which is likely to cause issues.

A more robust solution is to make the entire method idempotent by checking for initialization at the beginning of the function. This ensures that if the method is called multiple times, it has no effect after the first successful initialization, preventing state corruption.

Suggested change
self.kv_cache_config = kv_cache_config
self.may_add_encoder_only_layers_to_kv_cache_config()
# NOTE(cmq): initialize_attn_backend must before using self.attn_groups
self.initialize_attn_backend(kv_cache_config)
if not self.attn_groups:
self.initialize_attn_backend(kv_cache_config)
if self.attn_groups:
return
self.kv_cache_config = kv_cache_config
self.may_add_encoder_only_layers_to_kv_cache_config()
# NOTE(cmq): initialize_attn_backend must before using self.attn_groups
self.initialize_attn_backend(kv_cache_config)

@github-actions
Copy link

github-actions bot commented Dec 1, 2025

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Signed-off-by: wucong25 <wucong25@huawei.com>
@wangxiyuan wangxiyuan added the hold-on The PR should be hold-on but no need to release label Dec 2, 2025
@weijinqian0
Copy link
Collaborator

This PR is not common in RL scenarios.

@weijinqian0 weijinqian0 closed this Dec 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hold-on The PR should be hold-on but no need to release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants