Commit e52ebf8
authored
[MM][Model][Perf] Remove Qwen2.5-VL modeling files and add patch for VisionAttention (#4349)
### What this PR does / why we need it?
- [x] Patch `Qwen2_5_VisionAttention` with
`AscendQwen2_5_VisionAttention`.
- [x] Replace `AscendQwen2_5_VisionTransformer` with
`Qwen2_5_VisionTransformer` in vllm.
- [x] Move padding logic (q/k/v and cos/sin) before FA to `forward()` of
`Qwen2_5_VisionAttention`.
- [x] Covert `cu_seqlens` in `Qwen2_5_VisionAttention` from cumulative
form to intervals and move it to cpu (compatible for npu FA).
- [x] Remove Qwen2.5-VL modeling files.
- [x] Remove Qwen2.5-VL (without padding) modeling files.
- [x] Remove related UT.
- [x] Make `set_forward_context` pluggable when getting MM embedding.
Find more details at vllm-project/vllm#29388.
- [x] Simplify padding logic for FA.
- [x] Add patch for vllm-project/vllm#28798.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
- [x] Functional test (eager mode)
- [x] Functional test (graph mode)
- [x] Benchmark
- vLLM version: v0.11.2
---------
Signed-off-by: shen-shanshan <467638484@qq.com>1 parent bdc6697 commit e52ebf8
File tree
9 files changed
+802
-2100
lines changed- tests/ut/models
- vllm_ascend
- models
- patch/worker
9 files changed
+802
-2100
lines changedThis file was deleted.
This file was deleted.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | | - | |
4 | | - | |
5 | 3 | | |
6 | 4 | | |
7 | 5 | | |
| |||
10 | 8 | | |
11 | 9 | | |
12 | 10 | | |
13 | | - | |
14 | | - | |
| 11 | + | |
15 | 12 | | |
16 | 13 | | |
17 | 14 | | |
18 | | - | |
19 | | - | |
20 | | - | |
21 | | - | |
22 | | - | |
23 | | - | |
24 | | - | |
25 | | - | |
26 | | - | |
27 | | - | |
28 | | - | |
29 | | - | |
30 | | - | |
| 15 | + | |
31 | 16 | | |
32 | 17 | | |
33 | 18 | | |
| |||
0 commit comments