Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion vllm_ascend/worker/model_runner_v1.py
Original file line number Diff line number Diff line change
Expand Up @@ -2697,7 +2697,8 @@ def initialize_kv_cache(self, kv_cache_config: KVCacheConfig) -> None:
self.kv_cache_config = kv_cache_config
self.may_add_encoder_only_layers_to_kv_cache_config()
# NOTE(cmq): initialize_attn_backend must before using self.attn_groups
self.initialize_attn_backend(kv_cache_config)
if not self.attn_groups:
self.initialize_attn_backend(kv_cache_config)
Comment on lines 2697 to +2701
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This change prevents a crash if initialize_kv_cache is called multiple times, but it's an incomplete fix and can lead to subtle bugs. The function may_add_encoder_only_layers_to_kv_cache_config() is still called unconditionally on each invocation. This function appends to self.kv_cache_config.kv_cache_groups, which will result in duplicate entries if initialize_kv_cache is called more than once. Subsequent parts of this method, like may_reinitialize_input_batch and initialize_kv_cache_tensors, will then operate with an inconsistent state (a modified kv_cache_config but stale attn_groups), which is likely to cause issues.

A more robust solution is to make the entire method idempotent by checking for initialization at the beginning of the function. This ensures that if the method is called multiple times, it has no effect after the first successful initialization, preventing state corruption.

Suggested change
self.kv_cache_config = kv_cache_config
self.may_add_encoder_only_layers_to_kv_cache_config()
# NOTE(cmq): initialize_attn_backend must before using self.attn_groups
self.initialize_attn_backend(kv_cache_config)
if not self.attn_groups:
self.initialize_attn_backend(kv_cache_config)
if self.attn_groups:
return
self.kv_cache_config = kv_cache_config
self.may_add_encoder_only_layers_to_kv_cache_config()
# NOTE(cmq): initialize_attn_backend must before using self.attn_groups
self.initialize_attn_backend(kv_cache_config)

self.use_hybrid_blocks = (len(self.attn_groups) > 1)
# NOTE: Currently, we determine whether we need `num_accepted_tokens` through `MambaSpec`.
self.need_accepted_tokens = any([
Expand Down
Loading