Skip to content

Commit 1662ee5

Browse files
committed
add
Signed-off-by: Roger Wang <hey@rogerw.io>
1 parent 2f090f1 commit 1662ee5

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

_posts/2025-10-29-run-multimodal-reasoning-agents-nvidia-nemotron.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ author: "NVIDIA Nemotron Team"
66

77
We are excited to release [NVIDIA Nemotron Nano 2 VL](https://huggingface.co/nvidia/Nemotron-Nano-12B-v2-VL-BF16), supported by vLLM. This open vision language model ([VLM](https://www.nvidia.com/en-us/glossary/vision-language-models/)) is built for video understanding and document intelligence.
88

9-
Nemotron Nano 2 VL uses a hybrid Transformer–Mamba design and delivers higher throughput while maintaining state-of-the-art multimodal reasoning accuracy. The model also features **Efficient Video Sampling (EVS)**, a new technique that reduces redundant [tokens](https://blogs.nvidia.com/blog/ai-tokens-explained/) generation for video workloads, allowing processing of more videos with higher efficiency.
9+
Nemotron Nano 2 VL uses a hybrid Transformer–Mamba design and delivers higher throughput while maintaining state-of-the-art multimodal reasoning accuracy. The model also features [**Efficient Video Sampling (EVS)**](https://arxiv.org/abs/2510.14624), a new technique that reduces redundant [tokens](https://blogs.nvidia.com/blog/ai-tokens-explained/) generation for video workloads, allowing processing of more videos with higher efficiency.
1010

1111
In this blog post, we’ll explore how Nemotron Nano 2 VL advances video understanding and document intelligence, showcase real-world use cases and benchmark results, and guide you through getting started with vLLM for inference to unlock high-efficiency multimodal AI at scale.
1212

@@ -53,7 +53,7 @@ Figure 2: Accuracy trend of the Nemotron Nano 2 VL model across various token-dr
5353
* Get started:
5454
* Download model weights from Hugging Face \- [BF16](https://huggingface.co/nvidia/Nemotron-Nano-12B-v2-VL-BF16), [FP8](https://huggingface.co/nvidia/Nemotron-Nano-12B-v2-VL-FP8), [FP4-QAD](https://huggingface.co/nvidia/Nemotron-Nano-12B-v2-VL-FP4-QAD)
5555
* Run with vLLM for inference
56-
* [Technical report](https://www.overleaf.com/project/68d1d48c83696e11ba669f70) to build custom, optimized models with Nemotron techniques..
56+
* [Technical report](https://research.nvidia.com/labs/adlr/files/NVIDIA-Nemotron-Nano-V2-VL-report.pdf) to build custom, optimized models with Nemotron techniques..
5757

5858
## Run optimized inference with vLLM
5959

0 commit comments

Comments
 (0)