Skip to content

Commit 0d1f466

Browse files
committed
update figures
Signed-off-by: hsliu <liuhongsheng4@huawei.com>
1 parent 391612e commit 0d1f466

File tree

5 files changed

+30
-48
lines changed

5 files changed

+30
-48
lines changed

_posts/2025-11-30-vllm-omni.md

Lines changed: 30 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,24 @@
1-
## **Announcing vLLM-Omni: Easy, Fast, and Cheap Omni-Modality Model Serving" author: "The vLLM-Omni Team"**
1+
---
2+
layout: post
3+
title: "Announcing vLLM-Omni: Easy, Fast, and Cheap Omni-Modality Model Serving"
4+
author: "The vLLM-Omni Team"
5+
image: /assets/figures/2025-11-30-vllm-omni/omni-modality-log-text-dark.png
6+
---
27

38
We are excited to announce the official release of **vLLM-Omni**, a major extension of the vLLM ecosystem designed to support the next generation of AI: omni-modality models.
49

5-
Since its inception, vLLM has focused on high-throughput, memory-efficient serving for Large Language Models (LLMs). However, the landscape of generative AI is shifting rapidly. Models are no longer just about text-in, text-out. Today's state-of-the-art models reason across text, images, audio, and video, and they generate heterogeneous outputs using diverse architectures.
10+
<p align="center">
11+
<img src="/assets/figures/2025-11-30-vllm-omni/omni-modality-log-text-dark.png" alt="vllm-omni logo" width="80%">
12+
</p>
613

7-
**vLLM-Omni** answers this call, extending vLLM’s legendary performance to the world of multi-modal and non-autoregressive inference.
814

9-
\<p align="center"\>
10-
\<img src="/assets/figures/vllm-omni-logo-text-dark.png" alt="vLLM Omni Logo" width="60%"\>
11-
\</p\>
15+
Since its inception, vLLM has focused on high-throughput, memory-efficient serving for Large Language Models (LLMs). However, the landscape of generative AI is shifting rapidly. Models are no longer just about text-in, text-out. Today's state-of-the-art models reason across text, images, audio, and video, and they generate heterogeneous outputs using diverse architectures.
16+
17+
**vLLM-Omni** is the first open source framework to support omni-modality model serving that extends vLLM’s legendary performance to the world of multi-modal and non-autoregressive inference.
18+
19+
<p align="center">
20+
<img src="/assets/figures/2025-11-30-vllm-omni/omni-modality-model-architecture.png" alt="omni-modality model architecture" width="80%">
21+
</p>
1222

1323
## **Why vLLM-Omni?**
1424

@@ -22,42 +32,26 @@ vLLM-Omni addresses three critical shifts in model architecture:
2232

2333
## **Inside the Architecture**
2434

25-
vLLM-Omni is not just a wrapper; it is a re-imagining of how vLLM handles data flow. It introduces a fully disaggregated pipeline that allows for dynamic resource allocation across different stages of generation.
26-
27-
\<p align="center"\>
28-
\<img src="/assets/figures/omni-modality-model-architecture.png" alt="Omni-modality model architecture" width="80%"\>
29-
\</p\>
30-
As shown above, the architecture unifies distinct phases:
35+
vLLM-Omni is not just a wrapper; it is a re-imagining of how vLLM handles data flow. It introduces a fully disaggregated pipeline that allows for dynamic resource allocation across different stages of generation. As shown above, the architecture unifies distinct phases:
3136

3237
* **Modality Encoders:** Efficiently processing inputs (ViT, T5, etc.)
3338
* **LLM Core:** leveraging vLLM's PagedAttention for the autoregressive reasoning stage.
3439
* **Modality Generators:** High-performance serving for DiT and other decoding heads to produce rich media outputs.
3540

3641
### **Key Features**
3742

38-
* **Simplicity:** If you know how to use vLLM, you know how to use vLLM-Omni. We maintain seamless integration with Hugging Face models and offer an OpenAI-compatible API server.
39-
40-
# todo @liuhongsheng, add the vLLM-Omni architecture
41-
42-
43-
* **Flexibility:** With the OmniStage abstraction, we provide a simple and straightforward way to support various Omni-Modality models including Qwen-Omni, Qwen-Image, SD models.
43+
<p align="center">
44+
<img src="/assets/figures/2025-11-30-vllm-omni/vllm-omni-user-interface.png" alt="vllm-omni user interface" width="80%">
45+
</p>
4446

47+
* **Simplicity:** If you know how to use vLLM, you know how to use vLLM-Omni. We maintain seamless integration with Hugging Face models and offer an OpenAI-compatible API server.
4548

46-
* **Performance:** We utilize pipelined stage execution to overlap computation, ensuring that while one stage is processing, others aren't idle.
47-
48-
# todo @zhoutaichang, please add a figure to illustrate the pipelined stage execution.
49+
* **Flexibility:** With the OmniStage abstraction, we provide a simple and straightforward way to support various omni-modality models including Qwen-Omni, Qwen-Image, and other state-of-the-art models.
4950

50-
## **Performance**
51+
* **Performance:** We utilize pipelined stage execution to overlap computation for high throughput performance, ensuring that while one stage is processing, others aren't idle.
5152

5253
We benchmarked vLLM-Omni against Hugging Face Transformers to demonstrate the efficiency gains in omni-modal serving.
5354

54-
| Metric | vLLM-Omni | HF Transformers | Improvement |
55-
| :---- | :---- | :---- | :---- |
56-
| **Throughput** (req/s) | **TBD** | TBD | **TBD x** |
57-
| **Latency** (TTFT ms) | **TBD** | TBD | **TBD x** |
58-
| **GPU Memory** (GB) | **TBD** | TBD | **TBD %** |
59-
60-
*Note: Benchmarks were run on \[Insert Hardware Specs\] using \[Insert Model Name\].*
6155

6256
## **Future Roadmap**
6357

@@ -69,34 +63,22 @@ vLLM-Omni is evolving rapidly. Our roadmap is focused on expanding model support
6963
* **Full disaggregation:** Based on the OmniStage abstraction, we expect to support full disaggregation (encoder/prefill/decode/generation) across different inference stages in order to improve throughput and reduce latency.
7064
* **Hardware Support:** Following the hardware plugin system, we plan to expand our support for various hardware backends to ensure vLLM-Omni runs efficiently everywhere.
7165

72-
Contributions and collabrations from the open source community are welcome.
7366

7467
## **Getting Started**
7568

76-
Getting started with vLLM-Omni is straightforward. The initial release is built on top of vLLM v0.11.0.
69+
Getting started with vLLM-Omni is straightforward. The initial vllm-omni v0.11.0rc release is built on top of vLLM v0.11.0.
7770

7871
### **Installation**
7972

80-
First, set up your environment:
81-
82-
\# Create a virtual environment
83-
uv venv \--python 3.12 \--seed
84-
source .venv/bin/activate
85-
86-
\# Install the base vLLM
87-
uv pip install vllm==0.11.0 \--torch-backend=auto
88-
89-
Next, install the vLLM-Omni extension:
90-
91-
git clone \[https://github.com/vllm-project/vllm-omni.git\](https://github.com/vllm-project/vllm-omni.git)
92-
cd vllm\_omni
93-
uv pip install \-e .
73+
Check out our [Installation Doc](https://vllm-omni.readthedocs.io/en/latest/getting_started/installation/) for details.
9474

95-
### **Running the Qwen3-Omni model**
75+
### **Serving the omni-modality models**
9676

97-
@huayongxiang, add the gradio example for Qwen3-Omni model inference
77+
Check out our [examples directory](https://github.com/vllm-project/vllm-omni/tree/main/examples) for specific scripts to launch image, audio, and video generation workflows. vLLM-Omni also provides the gradio support to improve user experience, below is a demo example for serving Qwen-Image:
9878

99-
Check out our [examples directory](https://www.google.com/search?q=https://github.com/vllm-project/vllm-omni/tree/main/examples) for specific scripts to launch image, audio, and video generation workflows.
79+
<p align="center">
80+
<img src="/assets/figures/2025-11-30-vllm-omni/vllm-omni-gradio-serving-demo.png" alt="vllm-omni serving qwen-image with gradio" width="80%">
81+
</p>
10082

10183
## **Join the Community**
10284

51.5 KB
Loading
1.06 MB
Loading
39.8 KB
Loading
280 KB
Loading

0 commit comments

Comments
 (0)