eagle reconstruct #4583

zhenwenqi2024 · 2025-12-01T02:43:04Z

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.11.2
vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2

Signed-off-by: zhenwenqi2024 <zhenwenqi_2022@qq.com>

github-actions · 2025-12-01T02:45:04Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request refactors NPUModelRunner to inherit from GPUModelRunner, which is a good step towards reducing code duplication and improving maintainability. However, the implementation has a couple of critical issues. The __init__ method of NPUModelRunner does not call the parent's __init__, leading to an improperly initialized object. Additionally, the load_model method has been removed, which causes the loss of crucial NPU-specific model loading logic. These issues need to be addressed to ensure the correctness and stability of the NPU backend.

gemini-code-assist · 2025-12-01T02:45:10Z

vllm_ascend/worker/model_runner_v1.py

-class NPUModelRunner(LoRAModelRunnerMixin):
+class NPUModelRunner(GPUModelRunner, LoRAModelRunnerMixin):

    def __init__(self, vllm_config: VllmConfig, device: torch.device):


The NPUModelRunner now inherits from GPUModelRunner, but its __init__ method does not call super().__init__(). This is a critical issue as it breaks the inheritance chain, and the GPUModelRunner part of the object is not properly initialized. This can lead to AttributeErrors and other subtle bugs.

The __init__ method of NPUModelRunner duplicates a lot of code from GPUModelRunner.__init__. The correct approach is to call super().__init__(vllm_config, device) at the beginning of NPUModelRunner.__init__ and then only perform NPU-specific initializations or overrides. This will make the code more maintainable and less prone to errors when the base GPUModelRunner changes.

gemini-code-assist · 2025-12-01T02:45:10Z

vllm_ascend/worker/model_runner_v1.py

            self.eplb_updator.warm_up_eplb()

-    def load_model(self) -> None:
-        logger.info("Starting to load model %s...", self.model_config.model)
-
-        with DeviceMemoryProfiler() as m:  # noqa: SIM117
-            self.model = get_model(vllm_config=self.vllm_config)
-            if self.dynamic_eplb:
-                model_register(self.model, self.model_config)
-            if get_ascend_device_type() == AscendDeviceType._310P:
-                from vllm.model_executor.layers.linear import (
-                    MergedColumnParallelLinear, QKVParallelLinear,
-                    RowParallelLinear)
-                for module in self.model.modules():
-                    if isinstance(module,
-                                  (MergedColumnParallelLinear,
-                                   QKVParallelLinear, RowParallelLinear)):
-                        module.weight.data = self._convert_torch_format(
-                            module.weight.data)
-            if self.drafter:
-                logger.info("Loading drafter model...")
-                self.drafter.load_model(self.model)
-                if self.drafter.name == SpecDcodeType.EAGLE3:
-                    self.model.set_aux_hidden_state_layers(
-                        self.model.get_eagle3_aux_hidden_state_layers())
-
-            if self.lora_config:
-                self.model = self.load_lora_model(self.model, self.vllm_config,
-                                                  self.device)
-        logger.info("Loading model weights took %.4f GB",
-                    m.consumed_memory / float(2**30))


The load_model method has been completely removed from NPUModelRunner. Since NPUModelRunner now inherits from GPUModelRunner, this means GPUModelRunner.load_model will be called. However, the original NPUModelRunner.load_model contained crucial NPU-specific logic that is now lost. This includes:

model_register for dynamic_eplb.

Weight format conversion for _310P devices.

Setting auxiliary hidden state layers for EAGLE3 speculative decoding.

This will likely break these features. Instead of removing the method, you should override it, call super().load_model(), and then execute the NPU-specific logic.

github-actions · 2025-12-01T10:44:04Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

eagle reconstruct

2a3ce8c

Signed-off-by: zhenwenqi2024 <zhenwenqi_2022@qq.com>

gemini-code-assist bot reviewed Dec 1, 2025

View reviewed changes

github-actions bot added the merge-conflicts label Dec 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

eagle reconstruct #4583

eagle reconstruct #4583

zhenwenqi2024 commented Dec 1, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Dec 1, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 1, 2025

Uh oh!

gemini-code-assist bot Dec 1, 2025

Uh oh!

github-actions bot commented Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

eagle reconstruct #4583

Are you sure you want to change the base?

eagle reconstruct #4583

Conversation

zhenwenqi2024 commented Dec 1, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Dec 1, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

zhenwenqi2024 commented Dec 1, 2025 •

edited by github-actions bot

Loading