[CI/Build] Moves to cuda-base runtime image while retaining minimal JIT dependencies #29270

bbartels · 2025-11-23T15:34:44Z

Purpose

Now that #26966 is merged, more runtime image dependencies can be culled. The devel base image isn't needed anymore and switched to base. While flashinfer doesn't require source compilation, DeepGEMM and DeepEP still do, so some JIT dependencies are still needed.

vllm:old                                           92e48d400646       27.7GB            0B    U   
vllm:new                                         d5862ba6e2bf       19.7GB             0B    U

Reduces final image size by 8GB at rest and ~5 GB compressed 🎉

Part of the Pr is based on https://github.com/vllm-project/vllm/pull/28727/files#diff-f34da55ca08f1a30591d8b0b3e885bcc678537b2a9a4aadea4f190806b374ddcR317-R334 from @rzabarazesh for credit.

Test Plan

I've tested this PR locally on a 8xH200 node with various models and TP=8 and it works as expected.
All of CI should be run against this PR.

Test Result

Local tests work without issue. CI is TBD

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: bbartels <benjamin@bartels.dev>

gemini-code-assist

Code Review

This pull request aims to reduce the final Docker image size by switching from the devel to the base CUDA image and then installing only the minimal dependencies required for runtime JIT compilation. This is a good optimization. However, the changes as they are will cause the Docker build to fail due to reliance on a default CUDA_VERSION that does not correspond to available Docker images or apt packages. There is also an issue with an incomplete C++ compiler setup for the runtime JIT environment. I've provided critical and high-severity comments with suggestions to fix these issues.

gemini-code-assist · 2025-11-23T15:36:22Z

docker/Dockerfile

-# TODO: Restore to base image after FlashInfer AOT wheel fixed
-ARG FINAL_BASE_IMAGE=nvidia/cuda:${CUDA_VERSION}-devel-ubuntu22.04
+# Using cuda base image with minimal dependencies necessary for JIT compilation (FlashInfer, DeepGEMM, EP kernels)
+ARG FINAL_BASE_IMAGE=nvidia/cuda:${CUDA_VERSION}-base-ubuntu22.04


The change to use nvidia/cuda:${CUDA_VERSION}-base-ubuntu22.04 as the FINAL_BASE_IMAGE will cause the Docker build to fail with the default CUDA_VERSION=12.9.1. The image tag nvidia/cuda:12.9.1-base-ubuntu22.04 does not exist on Docker Hub. This will break the build at the FROM ${FINAL_BASE_IMAGE} instruction. While CUDA_VERSION can be overridden at build time, the default value should correspond to an existing image to ensure the Dockerfile is buildable out-of-the-box.

I checked, the image exists.

https://hub.docker.com/layers/nvidia/cuda/12.9.1-base-ubuntu22.04

docker/Dockerfile

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

docker/Dockerfile

Signed-off-by: bbartels <benjamin@bartels.dev>

mergify · 2025-11-23T15:43:06Z

Documentation preview: https://vllm--29270.org.readthedocs.build/en/29270/

docker/Dockerfile

Signed-off-by: Benjamin Bartels <benjamin@bartels.dev>

Signed-off-by: bbartels <benjamin@bartels.dev>

chaunceyjiang · 2025-11-24T02:05:27Z

/cc @wzshiming PTAL.

wzshiming · 2025-11-24T11:36:12Z

docker/Dockerfile

-# TODO: Restore to base image after FlashInfer AOT wheel fixed
-ARG FINAL_BASE_IMAGE=nvidia/cuda:${CUDA_VERSION}-devel-ubuntu22.04
+# Using cuda base image with minimal dependencies necessary for JIT compilation (FlashInfer, DeepGEMM, EP kernels)
+ARG FINAL_BASE_IMAGE=nvidia/cuda:${CUDA_VERSION}-base-ubuntu22.04


I checked, the image exists.

https://hub.docker.com/layers/nvidia/cuda/12.9.1-base-ubuntu22.04

wzshiming · 2025-11-24T11:42:09Z

docker/Dockerfile


+# Install CUDA development tools and build essentials for runtime JIT compilation
+# (FlashInfer, DeepGEMM, EP kernels all require compilation at runtime)
+RUN CUDA_VERSION_DASH=$(echo $CUDA_VERSION | cut -d. -f1,2 | tr '.' '-') && \


The previous step was missing rm -rf /var/lib/apt/lists/*, I think this can be merged with the previous step to reduce duplication of updates and reduce the volume a little more

There is another PR that's open that addresses the previous rm -rf /var/lib/apt/lists/* being missing. #29060 I didn't want to cause any conflicts there. But happy to address it here instead if you prefer :)

hmellor

LGTM, but we should get more eyes on this before merging

tlrmchlsmth

Nice

mgoin

Pausing merge for ready-run-all-tests to run

NickLucche · 2025-11-24T17:10:18Z

I think we discussed about this @dtrifiro

bbartels · 2025-11-24T17:12:53Z

Pausing merge for ready-run-all-tests to run

Do I need to do anything for the tests to kick off?

mgoin

LGTM, same failures as nightly. Great work!

…IT dependencies (vllm-project#29270) Signed-off-by: bbartels <benjamin@bartels.dev> Signed-off-by: Benjamin Bartels <benjamin@bartels.dev>

amrmahdi · 2025-11-25T20:25:07Z

Thanks @bbartels this is a great change! addresses #28643

…IT dependencies (vllm-project#29270) Signed-off-by: bbartels <benjamin@bartels.dev> Signed-off-by: Benjamin Bartels <benjamin@bartels.dev>

…IT dependencies (vllm-project#29270) Signed-off-by: bbartels <benjamin@bartels.dev> Signed-off-by: Benjamin Bartels <benjamin@bartels.dev> Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>

…IT dependencies (vllm-project#29270) Signed-off-by: bbartels <benjamin@bartels.dev> Signed-off-by: Benjamin Bartels <benjamin@bartels.dev>

Fixes

9ec0bb4

Signed-off-by: bbartels <benjamin@bartels.dev>

bbartels force-pushed the jit-dep branch from 5536788 to 9ec0bb4 Compare November 23, 2025 15:35

mergify bot added the ci/build label Nov 23, 2025

bbartels changed the title ~~Adds minimal runtime dependencies for JIT compilation~~ [CI/Build] Adds minimal runtime dependencies for JIT compilation Nov 23, 2025

gemini-code-assist bot reviewed Nov 23, 2025

View reviewed changes

chatgpt-codex-connector bot reviewed Nov 23, 2025

View reviewed changes

docker/Dockerfile Outdated Show resolved Hide resolved

Fixes dockerfile dependency graph

60d6114

Signed-off-by: bbartels <benjamin@bartels.dev>

mergify bot added the documentation Improvements or additions to documentation label Nov 23, 2025

bbartels mentioned this pull request Nov 23, 2025

[CI/Build] Switch docker base image to runtime image #28727

Closed

5 tasks

bbartels changed the title ~~[CI/Build] Adds minimal runtime dependencies for JIT compilation~~ [CI/Build] Moves to cuda-base runtime image while retaining minimal JIT dependencies Nov 23, 2025

mergify bot added the nvidia label Nov 23, 2025

github-project-automation bot added this to NVIDIA Nov 23, 2025

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 23, 2025

bbartels commented Nov 23, 2025

View reviewed changes

docker/Dockerfile Outdated Show resolved Hide resolved

bbartels added 2 commits November 23, 2025 16:07

Update docker/Dockerfile

a53e398

Signed-off-by: Benjamin Bartels <benjamin@bartels.dev>

Fixes missing cuda-cuobjdump

ddadf49

Signed-off-by: bbartels <benjamin@bartels.dev>

chaunceyjiang requested review from hmellor and mgoin November 24, 2025 02:06

wzshiming reviewed Nov 24, 2025

View reviewed changes

hmellor approved these changes Nov 24, 2025

View reviewed changes

github-project-automation bot moved this to In review in NVIDIA Nov 24, 2025

tlrmchlsmth approved these changes Nov 24, 2025

View reviewed changes

mgoin added the ready-run-all-tests Trigger CI with all tests for wide-ranging PRs label Nov 24, 2025

mgoin requested changes Nov 24, 2025

View reviewed changes

mgoin approved these changes Nov 24, 2025

View reviewed changes

vllm-bot merged commit 4d6afca into vllm-project:main Nov 24, 2025
97 of 100 checks passed

github-project-automation bot moved this from In review to Done in NVIDIA Nov 24, 2025

hmellor mentioned this pull request Nov 26, 2025

[Feature][P0]: Switch to Runtime Base Image #28643

Closed

5 tasks

ZJY0516 mentioned this pull request Nov 27, 2025

[Bug]: Latest vllm docker can not serve nvfp4 models. curand_kernel.h: No such file or directory #29590

Closed

1 task

This was referenced Nov 28, 2025

[Bug]: Can't build VLLM wheel using VLLM docker image. #29669

Closed

[CI/build] Add libraries needed for building VLLM wheel to the test docker image. #29672

Closed

Uh oh!

[CI/Build] Moves to cuda-base runtime image while retaining minimal JIT dependencies #29270

[CI/Build] Moves to cuda-base runtime image while retaining minimal JIT dependencies #29270

Uh oh!

Conversation

bbartels commented Nov 23, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 23, 2025

Choose a reason for hiding this comment

Uh oh!

wzshiming Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

mergify bot commented Nov 23, 2025

Uh oh!

Uh oh!

chaunceyjiang commented Nov 24, 2025

Uh oh!

wzshiming Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

wzshiming Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

bbartels Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

hmellor left a comment

Choose a reason for hiding this comment

Uh oh!

tlrmchlsmth left a comment

Choose a reason for hiding this comment

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

NickLucche commented Nov 24, 2025

Uh oh!

bbartels commented Nov 24, 2025

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

amrmahdi commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

bbartels commented Nov 23, 2025 •

edited by github-actions bot

Loading

amrmahdi commented Nov 25, 2025 •

edited

Loading