vllm-project
diff --git a/‎_posts/2025-11-30-vllm-omni.md‎
Lines changed: 30 additions & 48 deletions b/‎_posts/2025-11-30-vllm-omni.md‎
Lines changed: 30 additions & 48 deletions
diff --git a/‎assets/figures/2025-11-30-vllm-omni/omni-modality-model-architecture.png‎
51.5 KB b/‎assets/figures/2025-11-30-vllm-omni/omni-modality-model-architecture.png‎
51.5 KB
diff --git a/‎assets/figures/2025-11-30-vllm-omni/vllm-omni-gradio-serving-demo.png‎
1.06 MB b/‎assets/figures/2025-11-30-vllm-omni/vllm-omni-gradio-serving-demo.png‎
1.06 MB
diff --git a/‎assets/figures/2025-11-30-vllm-omni/vllm-omni-logo-text-dark.png‎
39.8 KB b/‎assets/figures/2025-11-30-vllm-omni/vllm-omni-logo-text-dark.png‎
39.8 KB
diff --git a/‎assets/figures/2025-11-30-vllm-omni/vllm-omni-user-interface.png‎
280 KB b/‎assets/figures/2025-11-30-vllm-omni/vllm-omni-user-interface.png‎
280 KB
@@ -1,14 +1,24 @@
-## **Announcing vLLM-Omni: Easy, Fast, and Cheap Omni-Modality Model Serving" author: "The vLLM-Omni Team"**
+---
+layout: post
+title: "Announcing vLLM-Omni: Easy, Fast, and Cheap Omni-Modality Model Serving"
+author: "The vLLM-Omni Team"
+image: /assets/figures/2025-11-30-vllm-omni/omni-modality-log-text-dark.png
+---
 
 We are excited to announce the official release of **vLLM-Omni**, a major extension of the vLLM ecosystem designed to support the next generation of AI: omni-modality models.
 
-Since its inception, vLLM has focused on high-throughput, memory-efficient serving for Large Language Models (LLMs). However, the landscape of generative AI is shifting rapidly. Models are no longer just about text-in, text-out. Today's state-of-the-art models reason across text, images, audio, and video, and they generate heterogeneous outputs using diverse architectures.
+<p align="center">  
+<img src="/assets/figures/2025-11-30-vllm-omni/omni-modality-log-text-dark.png" alt="vllm-omni logo" width="80%">  
+</p>  
 
-**vLLM-Omni** answers this call, extending vLLM’s legendary performance to the world of multi-modal and non-autoregressive inference.
 
-\<p align="center"\>  
-\<img src="/assets/figures/vllm-omni-logo-text-dark.png" alt="vLLM Omni Logo" width="60%"\>  
-\</p\>
+Since its inception, vLLM has focused on high-throughput, memory-efficient serving for Large Language Models (LLMs). However, the landscape of generative AI is shifting rapidly. Models are no longer just about text-in, text-out. Today's state-of-the-art models reason across text, images, audio, and video, and they generate heterogeneous outputs using diverse architectures. 
+
+**vLLM-Omni** is the first open source framework to support omni-modality model serving that extends vLLM’s legendary performance to the world of multi-modal and non-autoregressive inference.
+
+<p align="center">  
+<img src="/assets/figures/2025-11-30-vllm-omni/omni-modality-model-architecture.png" alt="omni-modality model architecture" width="80%">  
+</p>  
 
 ## **Why vLLM-Omni?**
 
@@ -22,42 +32,26 @@ vLLM-Omni addresses three critical shifts in model architecture:
 
 ## **Inside the Architecture**
 
-vLLM-Omni is not just a wrapper; it is a re-imagining of how vLLM handles data flow. It introduces a fully disaggregated pipeline that allows for dynamic resource allocation across different stages of generation.
-
-\<p align="center"\>  
-\<img src="/assets/figures/omni-modality-model-architecture.png" alt="Omni-modality model architecture" width="80%"\>  
-\</p\>  
-As shown above, the architecture unifies distinct phases:
+vLLM-Omni is not just a wrapper; it is a re-imagining of how vLLM handles data flow. It introduces a fully disaggregated pipeline that allows for dynamic resource allocation across different stages of generation. As shown above, the architecture unifies distinct phases:
 
 * **Modality Encoders:** Efficiently processing inputs (ViT, T5, etc.)  
 * **LLM Core:** leveraging vLLM's PagedAttention for the autoregressive reasoning stage.  
 * **Modality Generators:** High-performance serving for DiT and other decoding heads to produce rich media outputs.
 
 ### **Key Features**
 
-* **Simplicity:** If you know how to use vLLM, you know how to use vLLM-Omni. We maintain seamless integration with Hugging Face models and offer an OpenAI-compatible API server.  
-
-# todo @liuhongsheng, add the vLLM-Omni architecture
-
-
-* **Flexibility:** With the OmniStage abstraction, we provide a simple and straightforward way to support various Omni-Modality models including Qwen-Omni, Qwen-Image, SD models.
+<p align="center">  
+<img src="/assets/figures/2025-11-30-vllm-omni/vllm-omni-user-interface.png" alt="vllm-omni user interface" width="80%">  
+</p>  
 
+* **Simplicity:** If you know how to use vLLM, you know how to use vLLM-Omni. We maintain seamless integration with Hugging Face models and offer an OpenAI-compatible API server.  
 
-* **Performance:** We utilize pipelined stage execution to overlap computation, ensuring that while one stage is processing, others aren't idle. 
-
-# todo @zhoutaichang, please add a figure to illustrate the pipelined stage execution.
+* **Flexibility:** With the OmniStage abstraction, we provide a simple and straightforward way to support various omni-modality models including Qwen-Omni, Qwen-Image, and other state-of-the-art models.
 
-## **Performance**
+* **Performance:** We utilize pipelined stage execution to overlap computation for high throughput performance, ensuring that while one stage is processing, others aren't idle. 
 
 We benchmarked vLLM-Omni against Hugging Face Transformers to demonstrate the efficiency gains in omni-modal serving.
 
-| Metric | vLLM-Omni | HF Transformers | Improvement |
-| :---- | :---- | :---- | :---- |
-| **Throughput** (req/s) | **TBD** | TBD | **TBD x** |
-| **Latency** (TTFT ms) | **TBD** | TBD | **TBD x** |
-| **GPU Memory** (GB) | **TBD** | TBD | **TBD %** |
-
-*Note: Benchmarks were run on \[Insert Hardware Specs\] using \[Insert Model Name\].*
 
 ## **Future Roadmap**
 
@@ -69,34 +63,22 @@ vLLM-Omni is evolving rapidly. Our roadmap is focused on expanding model support
 * **Full disaggregation:** Based on the OmniStage abstraction, we expect to support full disaggregation (encoder/prefill/decode/generation) across different inference stages in order to improve throughput and reduce latency.
 * **Hardware Support:** Following the hardware plugin system, we plan to expand our support for various hardware backends to ensure vLLM-Omni runs efficiently everywhere.  
 
-Contributions and collabrations from the open source community are welcome.
 
 ## **Getting Started**
 
-Getting started with vLLM-Omni is straightforward. The initial release is built on top of vLLM v0.11.0.
+Getting started with vLLM-Omni is straightforward. The initial vllm-omni v0.11.0rc release is built on top of vLLM v0.11.0.
 
 ### **Installation**
 
-First, set up your environment:
-
-\# Create a virtual environment  
-uv venv \--python 3.12 \--seed  
-source .venv/bin/activate
-
-\# Install the base vLLM  
-uv pip install vllm==0.11.0 \--torch-backend=auto
-
-Next, install the vLLM-Omni extension:
-
-git clone \[https://github.com/vllm-project/vllm-omni.git\](https://github.com/vllm-project/vllm-omni.git)  
-cd vllm\_omni  
-uv pip install \-e .
+Check out our [Installation Doc](https://vllm-omni.readthedocs.io/en/latest/getting_started/installation/) for details.
 
-### **Running the Qwen3-Omni model**
+### **Serving the omni-modality models**
 
-@huayongxiang, add the gradio example for Qwen3-Omni model inference
+Check out our [examples directory](https://github.com/vllm-project/vllm-omni/tree/main/examples) for specific scripts to launch image, audio, and video generation workflows. vLLM-Omni also provides the gradio support to improve user experience, below is a demo example for serving Qwen-Image:
 
-Check out our [examples directory](https://www.google.com/search?q=https://github.com/vllm-project/vllm-omni/tree/main/examples) for specific scripts to launch image, audio, and video generation workflows.
+<p align="center">  
+<img src="/assets/figures/2025-11-30-vllm-omni/vllm-omni-gradio-serving-demo.png" alt="vllm-omni serving qwen-image with gradio" width="80%">  
+</p>  
 
 ## **Join the Community**