-
Notifications
You must be signed in to change notification settings - Fork 622
【fix】: add attn_groups judge #4590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request attempts to prevent a crash when initialize_kv_cache is called multiple times by adding a guard around the initialize_attn_backend call. While this fixes the immediate crash, the fix is incomplete and introduces a risk of state corruption because other parts of the method are not idempotent. My review includes a suggestion for a more robust fix that makes the entire method idempotent, ensuring correct behavior on subsequent calls.
| self.kv_cache_config = kv_cache_config | ||
| self.may_add_encoder_only_layers_to_kv_cache_config() | ||
| # NOTE(cmq): initialize_attn_backend must before using self.attn_groups | ||
| self.initialize_attn_backend(kv_cache_config) | ||
| if not self.attn_groups: | ||
| self.initialize_attn_backend(kv_cache_config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change prevents a crash if initialize_kv_cache is called multiple times, but it's an incomplete fix and can lead to subtle bugs. The function may_add_encoder_only_layers_to_kv_cache_config() is still called unconditionally on each invocation. This function appends to self.kv_cache_config.kv_cache_groups, which will result in duplicate entries if initialize_kv_cache is called more than once. Subsequent parts of this method, like may_reinitialize_input_batch and initialize_kv_cache_tensors, will then operate with an inconsistent state (a modified kv_cache_config but stale attn_groups), which is likely to cause issues.
A more robust solution is to make the entire method idempotent by checking for initialization at the beginning of the function. This ensures that if the method is called multiple times, it has no effect after the first successful initialization, preventing state corruption.
| self.kv_cache_config = kv_cache_config | |
| self.may_add_encoder_only_layers_to_kv_cache_config() | |
| # NOTE(cmq): initialize_attn_backend must before using self.attn_groups | |
| self.initialize_attn_backend(kv_cache_config) | |
| if not self.attn_groups: | |
| self.initialize_attn_backend(kv_cache_config) | |
| if self.attn_groups: | |
| return | |
| self.kv_cache_config = kv_cache_config | |
| self.may_add_encoder_only_layers_to_kv_cache_config() | |
| # NOTE(cmq): initialize_attn_backend must before using self.attn_groups | |
| self.initialize_attn_backend(kv_cache_config) |
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
Signed-off-by: wucong25 <wucong25@huawei.com>
682865e to
381f0f4
Compare
|
This PR is not common in RL scenarios. |
What this PR does / why we need it?
Does this PR introduce any user-facing change?
How was this patch tested?