Skip to content

Conversation

@bbartels
Copy link
Contributor

@bbartels bbartels commented Nov 23, 2025

Purpose

Now that #26966 is merged, more runtime image dependencies can be culled. The devel base image isn't needed anymore and switched to base. While flashinfer doesn't require source compilation, DeepGEMM and DeepEP still do, so some JIT dependencies are still needed.

vllm:old                                           92e48d400646       27.7GB            0B    U   
vllm:new                                         d5862ba6e2bf       19.7GB             0B    U   

Reduces final image size by 8GB at rest and ~5 GB compressed 🎉

Part of the Pr is based on https://github.com/vllm-project/vllm/pull/28727/files#diff-f34da55ca08f1a30591d8b0b3e885bcc678537b2a9a4aadea4f190806b374ddcR317-R334 from @rzabarazesh for credit.

Test Plan

I've tested this PR locally on a 8xH200 node with various models and TP=8 and it works as expected.
All of CI should be run against this PR.

Test Result

Local tests work without issue. CI is TBD


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: bbartels <benjamin@bartels.dev>
@mergify mergify bot added the ci/build label Nov 23, 2025
@bbartels bbartels changed the title Adds minimal runtime dependencies for JIT compilation [CI/Build] Adds minimal runtime dependencies for JIT compilation Nov 23, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to reduce the final Docker image size by switching from the devel to the base CUDA image and then installing only the minimal dependencies required for runtime JIT compilation. This is a good optimization. However, the changes as they are will cause the Docker build to fail due to reliance on a default CUDA_VERSION that does not correspond to available Docker images or apt packages. There is also an issue with an incomplete C++ compiler setup for the runtime JIT environment. I've provided critical and high-severity comments with suggestions to fix these issues.

# TODO: Restore to base image after FlashInfer AOT wheel fixed
ARG FINAL_BASE_IMAGE=nvidia/cuda:${CUDA_VERSION}-devel-ubuntu22.04
# Using cuda base image with minimal dependencies necessary for JIT compilation (FlashInfer, DeepGEMM, EP kernels)
ARG FINAL_BASE_IMAGE=nvidia/cuda:${CUDA_VERSION}-base-ubuntu22.04
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The change to use nvidia/cuda:${CUDA_VERSION}-base-ubuntu22.04 as the FINAL_BASE_IMAGE will cause the Docker build to fail with the default CUDA_VERSION=12.9.1. The image tag nvidia/cuda:12.9.1-base-ubuntu22.04 does not exist on Docker Hub. This will break the build at the FROM ${FINAL_BASE_IMAGE} instruction. While CUDA_VERSION can be overridden at build time, the default value should correspond to an existing image to ensure the Dockerfile is buildable out-of-the-box.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Signed-off-by: bbartels <benjamin@bartels.dev>
@mergify
Copy link

mergify bot commented Nov 23, 2025

Documentation preview: https://vllm--29270.org.readthedocs.build/en/29270/

@mergify mergify bot added the documentation Improvements or additions to documentation label Nov 23, 2025
@bbartels bbartels changed the title [CI/Build] Adds minimal runtime dependencies for JIT compilation [CI/Build] Moves to cuda-base runtime image while retaining minimal JIT dependencies Nov 23, 2025
@mergify mergify bot added the nvidia label Nov 23, 2025
@DarkLight1337 DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 23, 2025
Signed-off-by: Benjamin Bartels <benjamin@bartels.dev>
Signed-off-by: bbartels <benjamin@bartels.dev>
@chaunceyjiang
Copy link
Collaborator

/cc @wzshiming PTAL.

# TODO: Restore to base image after FlashInfer AOT wheel fixed
ARG FINAL_BASE_IMAGE=nvidia/cuda:${CUDA_VERSION}-devel-ubuntu22.04
# Using cuda base image with minimal dependencies necessary for JIT compilation (FlashInfer, DeepGEMM, EP kernels)
ARG FINAL_BASE_IMAGE=nvidia/cuda:${CUDA_VERSION}-base-ubuntu22.04
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


# Install CUDA development tools and build essentials for runtime JIT compilation
# (FlashInfer, DeepGEMM, EP kernels all require compilation at runtime)
RUN CUDA_VERSION_DASH=$(echo $CUDA_VERSION | cut -d. -f1,2 | tr '.' '-') && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous step was missing rm -rf /var/lib/apt/lists/*, I think this can be merged with the previous step to reduce duplication of updates and reduce the volume a little more

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another PR that's open that addresses the previous rm -rf /var/lib/apt/lists/* being missing. #29060 I didn't want to cause any conflicts there. But happy to address it here instead if you prefer :)

Copy link
Member

@hmellor hmellor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but we should get more eyes on this before merging

@github-project-automation github-project-automation bot moved this to In review in NVIDIA Nov 24, 2025
Copy link
Member

@tlrmchlsmth tlrmchlsmth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

@mgoin mgoin added the ready-run-all-tests Trigger CI with all tests for wide-ranging PRs label Nov 24, 2025
Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pausing merge for ready-run-all-tests to run

@NickLucche
Copy link
Collaborator

I think we discussed about this @dtrifiro

@bbartels
Copy link
Contributor Author

Pausing merge for ready-run-all-tests to run

Do I need to do anything for the tests to kick off?

Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, same failures as nightly. Great work!

@vllm-bot vllm-bot merged commit 4d6afca into vllm-project:main Nov 24, 2025
97 of 100 checks passed
@github-project-automation github-project-automation bot moved this from In review to Done in NVIDIA Nov 24, 2025
MatthewBonanni pushed a commit to MatthewBonanni/vllm that referenced this pull request Nov 24, 2025
…IT dependencies (vllm-project#29270)

Signed-off-by: bbartels <benjamin@bartels.dev>
Signed-off-by: Benjamin Bartels <benjamin@bartels.dev>
@amrmahdi
Copy link
Contributor

amrmahdi commented Nov 25, 2025

Thanks @bbartels this is a great change! addresses #28643

bringlein pushed a commit to bringlein/vllm that referenced this pull request Nov 26, 2025
…IT dependencies (vllm-project#29270)

Signed-off-by: bbartels <benjamin@bartels.dev>
Signed-off-by: Benjamin Bartels <benjamin@bartels.dev>
devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025
…IT dependencies (vllm-project#29270)

Signed-off-by: bbartels <benjamin@bartels.dev>
Signed-off-by: Benjamin Bartels <benjamin@bartels.dev>
kitaekatt pushed a commit to kitaekatt/vllm that referenced this pull request Dec 1, 2025
…IT dependencies (vllm-project#29270)

Signed-off-by: bbartels <benjamin@bartels.dev>
Signed-off-by: Benjamin Bartels <benjamin@bartels.dev>
charlotte12l pushed a commit to charlotte12l/vllm that referenced this pull request Dec 5, 2025
…IT dependencies (vllm-project#29270)

Signed-off-by: bbartels <benjamin@bartels.dev>
Signed-off-by: Benjamin Bartels <benjamin@bartels.dev>
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
Zhathw pushed a commit to Zhathw/vllm that referenced this pull request Dec 6, 2025
…IT dependencies (vllm-project#29270)

Signed-off-by: bbartels <benjamin@bartels.dev>
Signed-off-by: Benjamin Bartels <benjamin@bartels.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build documentation Improvements or additions to documentation nvidia ready ONLY add when PR is ready to merge/full CI is needed ready-run-all-tests Trigger CI with all tests for wide-ranging PRs

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

10 participants