-
-
Notifications
You must be signed in to change notification settings - Fork 11.8k
[CI/Build] Moves to cuda-base runtime image while retaining minimal JIT dependencies #29270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to reduce the final Docker image size by switching from the devel to the base CUDA image and then installing only the minimal dependencies required for runtime JIT compilation. This is a good optimization. However, the changes as they are will cause the Docker build to fail due to reliance on a default CUDA_VERSION that does not correspond to available Docker images or apt packages. There is also an issue with an incomplete C++ compiler setup for the runtime JIT environment. I've provided critical and high-severity comments with suggestions to fix these issues.
| # TODO: Restore to base image after FlashInfer AOT wheel fixed | ||
| ARG FINAL_BASE_IMAGE=nvidia/cuda:${CUDA_VERSION}-devel-ubuntu22.04 | ||
| # Using cuda base image with minimal dependencies necessary for JIT compilation (FlashInfer, DeepGEMM, EP kernels) | ||
| ARG FINAL_BASE_IMAGE=nvidia/cuda:${CUDA_VERSION}-base-ubuntu22.04 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change to use nvidia/cuda:${CUDA_VERSION}-base-ubuntu22.04 as the FINAL_BASE_IMAGE will cause the Docker build to fail with the default CUDA_VERSION=12.9.1. The image tag nvidia/cuda:12.9.1-base-ubuntu22.04 does not exist on Docker Hub. This will break the build at the FROM ${FINAL_BASE_IMAGE} instruction. While CUDA_VERSION can be overridden at build time, the default value should correspond to an existing image to ensure the Dockerfile is buildable out-of-the-box.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked, the image exists.
https://hub.docker.com/layers/nvidia/cuda/12.9.1-base-ubuntu22.04
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Signed-off-by: bbartels <benjamin@bartels.dev>
|
Documentation preview: https://vllm--29270.org.readthedocs.build/en/29270/ |
Signed-off-by: Benjamin Bartels <benjamin@bartels.dev>
Signed-off-by: bbartels <benjamin@bartels.dev>
|
/cc @wzshiming PTAL. |
| # TODO: Restore to base image after FlashInfer AOT wheel fixed | ||
| ARG FINAL_BASE_IMAGE=nvidia/cuda:${CUDA_VERSION}-devel-ubuntu22.04 | ||
| # Using cuda base image with minimal dependencies necessary for JIT compilation (FlashInfer, DeepGEMM, EP kernels) | ||
| ARG FINAL_BASE_IMAGE=nvidia/cuda:${CUDA_VERSION}-base-ubuntu22.04 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked, the image exists.
https://hub.docker.com/layers/nvidia/cuda/12.9.1-base-ubuntu22.04
|
|
||
| # Install CUDA development tools and build essentials for runtime JIT compilation | ||
| # (FlashInfer, DeepGEMM, EP kernels all require compilation at runtime) | ||
| RUN CUDA_VERSION_DASH=$(echo $CUDA_VERSION | cut -d. -f1,2 | tr '.' '-') && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous step was missing rm -rf /var/lib/apt/lists/*, I think this can be merged with the previous step to reduce duplication of updates and reduce the volume a little more
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is another PR that's open that addresses the previous rm -rf /var/lib/apt/lists/* being missing. #29060 I didn't want to cause any conflicts there. But happy to address it here instead if you prefer :)
hmellor
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but we should get more eyes on this before merging
tlrmchlsmth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
mgoin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pausing merge for ready-run-all-tests to run
|
I think we discussed about this @dtrifiro |
Do I need to do anything for the tests to kick off? |
mgoin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, same failures as nightly. Great work!
…IT dependencies (vllm-project#29270) Signed-off-by: bbartels <benjamin@bartels.dev> Signed-off-by: Benjamin Bartels <benjamin@bartels.dev>
…IT dependencies (vllm-project#29270) Signed-off-by: bbartels <benjamin@bartels.dev> Signed-off-by: Benjamin Bartels <benjamin@bartels.dev>
…IT dependencies (vllm-project#29270) Signed-off-by: bbartels <benjamin@bartels.dev> Signed-off-by: Benjamin Bartels <benjamin@bartels.dev>
…IT dependencies (vllm-project#29270) Signed-off-by: bbartels <benjamin@bartels.dev> Signed-off-by: Benjamin Bartels <benjamin@bartels.dev>
…IT dependencies (vllm-project#29270) Signed-off-by: bbartels <benjamin@bartels.dev> Signed-off-by: Benjamin Bartels <benjamin@bartels.dev> Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
…IT dependencies (vllm-project#29270) Signed-off-by: bbartels <benjamin@bartels.dev> Signed-off-by: Benjamin Bartels <benjamin@bartels.dev>
Purpose
Now that #26966 is merged, more runtime image dependencies can be culled. The
develbase image isn't needed anymore and switched tobase. While flashinfer doesn't require source compilation, DeepGEMM and DeepEP still do, so some JIT dependencies are still needed.Reduces final image size by 8GB at rest and ~5 GB compressed 🎉
Part of the Pr is based on https://github.com/vllm-project/vllm/pull/28727/files#diff-f34da55ca08f1a30591d8b0b3e885bcc678537b2a9a4aadea4f190806b374ddcR317-R334 from @rzabarazesh for credit.
Test Plan
I've tested this PR locally on a 8xH200 node with various models and TP=8 and it works as expected.
All of CI should be run against this PR.
Test Result
Local tests work without issue. CI is TBD
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.