-
-
Notifications
You must be signed in to change notification settings - Fork 11.7k
Closed
Labels
Description
🚀 The feature, motivation and pitch
Description
The Dockerfile currently uses nvidia/cuda:12.9.1-devel-ubuntu22.04 as the final base image. The devel variant includes the full CUDA compiler toolchain (~7GB) which is only needed during build, not at runtime. Switching to the runtime variant will significantly reduce image size.
What You'll Do
- Change
FINAL_BASE_IMAGEfromdeveltoruntime(line 24) - Analyze if any runtime components actually need build tools
- Handle FlashInfer JIT compilation requirements:
- Test if AOT wheels work without build deps
- If needed, add conditional minimal build tools
- Verify all GPU functionality works with runtime image
- Update documentation
Deliverables
- Modified Dockerfile with runtime base image
- Conditional build dependency installation for FlashInfer (if needed)
- GPU functionality test results
- Before/after image size comparison
Alternatives
No response
Additional context
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Done