Skip to content

Commit e52ebf8

Browse files
[MM][Model][Perf] Remove Qwen2.5-VL modeling files and add patch for VisionAttention (#4349)
### What this PR does / why we need it? - [x] Patch `Qwen2_5_VisionAttention` with `AscendQwen2_5_VisionAttention`. - [x] Replace `AscendQwen2_5_VisionTransformer` with `Qwen2_5_VisionTransformer` in vllm. - [x] Move padding logic (q/k/v and cos/sin) before FA to `forward()` of `Qwen2_5_VisionAttention`. - [x] Covert `cu_seqlens` in `Qwen2_5_VisionAttention` from cumulative form to intervals and move it to cpu (compatible for npu FA). - [x] Remove Qwen2.5-VL modeling files. - [x] Remove Qwen2.5-VL (without padding) modeling files. - [x] Remove related UT. - [x] Make `set_forward_context` pluggable when getting MM embedding. Find more details at vllm-project/vllm#29388. - [x] Simplify padding logic for FA. - [x] Add patch for vllm-project/vllm#28798. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - [x] Functional test (eager mode) - [x] Functional test (graph mode) - [x] Benchmark - vLLM version: v0.11.2 --------- Signed-off-by: shen-shanshan <467638484@qq.com>
1 parent bdc6697 commit e52ebf8

File tree

9 files changed

+802
-2100
lines changed

9 files changed

+802
-2100
lines changed

tests/ut/models/test_qwen2_5_vl.py

Lines changed: 0 additions & 488 deletions
This file was deleted.

tests/ut/models/test_qwen2_5_vl_without_padding.py

Lines changed: 0 additions & 422 deletions
This file was deleted.

vllm_ascend/models/__init__.py

Lines changed: 2 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,5 @@
11
from vllm import ModelRegistry
22

3-
import vllm_ascend.envs as envs_ascend
4-
53

64
def register_model():
75
ModelRegistry.register_model(
@@ -10,24 +8,11 @@ def register_model():
108

119
ModelRegistry.register_model(
1210
"Qwen3VLMoeForConditionalGeneration",
13-
"vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLMoeForConditionalGeneration"
14-
)
11+
"vllm_ascend.models.qwen3_vl:AscendQwen3VLMoeForConditionalGeneration")
1512

1613
ModelRegistry.register_model(
1714
"Qwen3VLForConditionalGeneration",
18-
"vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLForConditionalGeneration"
19-
)
20-
21-
if envs_ascend.USE_OPTIMIZED_MODEL:
22-
ModelRegistry.register_model(
23-
"Qwen2_5_VLForConditionalGeneration",
24-
"vllm_ascend.models.qwen2_5_vl:AscendQwen2_5_VLForConditionalGeneration"
25-
)
26-
else:
27-
ModelRegistry.register_model(
28-
"Qwen2_5_VLForConditionalGeneration",
29-
"vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen2_5_VLForConditionalGeneration_Without_Padding"
30-
)
15+
"vllm_ascend.models.qwen3_vl:AscendQwen3VLForConditionalGeneration")
3116

3217
# There is no PanguProMoEForCausalLM in vLLM, so we should register it before vLLM config initialization
3318
# to make sure the model can be loaded correctly. This register step can be removed once vLLM support PanguProMoEForCausalLM.

0 commit comments

Comments
 (0)