-
Notifications
You must be signed in to change notification settings - Fork 1.9k
[None][Infra] Revert "[TRTLLM-8994][infra] upgrade to DLFW 25.10 and pytorch 2.9.0 / triton 3.5.0 (#8838)" #9039
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,2 +1,5 @@ | ||
| # These vulnerabilities were inherited from the base image (pytorch:25.10-py3) and should be removed when the base image | ||
| # These vulnerabilities were inherited from the base image (pytorch:25.06-py3) and should be removed when the base image | ||
| # is updated. | ||
|
|
||
| # WAR against https://github.com/advisories/GHSA-8qvm-5x2c-j2w7 | ||
| protobuf>=4.25.8 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -147,6 +147,11 @@ check <https://github.com/NVIDIA/TensorRT-LLM/tree/main/docker>. | |
|
|
||
| ## Build TensorRT LLM | ||
|
|
||
| ```{tip} | ||
| :name: build-from-source-tip-cuda-version | ||
| TensorRT LLM 1.1 supports both CUDA 12.9 and 13.0 while some dependency changes are required. The `requirements.txt` contains dependencies needed by CUDA 13.0. If you are using CUDA 12.9, please uncomment lines end with `# <For CUDA 12.9>` and comment out the next lines. | ||
| ``` | ||
|
Comment on lines
+150
to
+153
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🛠️ Refactor suggestion | 🟠 Major Documentation inconsistent with automated CUDA version handling. The documentation instructs users to manually uncomment/comment lines in This creates confusion:
Update the documentation to clarify the recommended approach: ```{tip}
:name: build-from-source-tip-cuda-version
-TensorRT LLM 1.1 supports both CUDA 12.9 and 13.0 while some dependency changes are required. The `requirements.txt` contains dependencies needed by CUDA 13.0. If you are using CUDA 12.9, please uncomment lines end with `# <For CUDA 12.9>` and comment out the next lines.
+TensorRT LLM 1.1 supports both CUDA 12.9 and 13.0 while some dependency changes are required. The `requirements.txt` contains dependencies for CUDA 13.0 by default. If you are using CUDA 12.9, the build script will automatically detect your CUDA version from the `CUDA_VERSION` environment variable and adjust dependencies accordingly. Alternatively, you can manually uncomment lines ending with `# <For CUDA 12.9>` and comment out the following lines before building.In docs/source/installation/build-from-source-linux.md around lines 150 to 153, |
||
|
|
||
| ### Option 1: Full Build with C++ Compilation | ||
|
|
||
| The following command compiles the C++ code and packages the compiled libraries along with the Python files into a wheel. When developing C++ code, you need this full build command to apply your code changes. | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -12,16 +12,23 @@ | |
| Install CUDA Toolkit following the [CUDA Installation Guide for Linux](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/) and | ||
| make sure `CUDA_HOME` environment variable is properly set. | ||
|
|
||
| ```{tip} | ||
| :name: installation-linux-tip-cuda-version | ||
| TensorRT LLM 1.1 supports both CUDA 12.9 and 13.0. The wheel package release only supports CUDA 12.9, while CUDA 13.0 is only supported through NGC container release. | ||
| ``` | ||
|
|
||
| ```bash | ||
| # By default, PyTorch CUDA 12.8 package is installed. Install PyTorch CUDA 13.0 package to align with the CUDA version used for building TensorRT LLM wheels. | ||
| pip3 install torch==2.9.0 torchvision --index-url https://download.pytorch.org/whl/cu130 | ||
| # Optional step: Only required for NVIDIA Blackwell GPUs and SBSA platform | ||
| pip3 install torch==2.7.1 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128 | ||
|
Comment on lines
+21
to
+22
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chainClarify PyTorch and CUDA version references. The documentation mentions:
Please clarify:
🌐 Web query: 💡 Result: The latest stable PyTorch release with official CUDA 12.8 (cu128) binaries is PyTorch 2.7 (2.7.x). [1][2][3] Sources:
🌐 Web query: 💡 Result: TensorRT‑LLM 1.1 supports CUDA 12.9 and CUDA 13.0. The prebuilt Python wheel release only supports CUDA 12.9; CUDA 13.0 is supported via the NGC container or when building from source with the CUDA‑13 dependencies. [1][2] Sources:
🌐 Web query: 💡 Result: TensorRT-LLM 1.1 supports CUDA 12.9 and CUDA 13.0. The prebuilt pip wheel release only supports CUDA 12.9; CUDA 13.0 is supported via the NGC container or when building from source with the CUDA‑13 dependencies. [1][2] Sources: CUDA version mismatch: PyTorch wheels use CUDA 12.8, but TensorRT-LLM 1.1 requires CUDA 12.9. The documentation instructs installing
🤖 Prompt for AI Agents |
||
|
|
||
| sudo apt-get -y install libopenmpi-dev | ||
|
|
||
| # Optional step: Only required for disagg-serving | ||
| sudo apt-get -y install libzmq3-dev | ||
| ``` | ||
|
|
||
| PyTorch CUDA 12.8 package is required for supporting NVIDIA Blackwell GPUs and SBSA platform. On prior GPUs or Linux x86_64 platform, this extra installation is not required. | ||
|
|
||
| ```{tip} | ||
| Instead of manually installing the preqrequisites as described | ||
| above, it is also possible to use the pre-built [TensorRT LLM Develop container | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -16,6 +16,9 @@ AARCH64_TRIPLE = "aarch64-linux-gnu" | |
|
|
||
| LLM_DOCKER_IMAGE = env.dockerImage | ||
|
|
||
| LLM_DOCKER_IMAGE_12_9 = "urm.nvidia.com/sw-tensorrt-docker/tensorrt-llm:pytorch-25.06-py3-x86_64-ubuntu24.04-trt10.11.0.33-skip-tritondevel-202509091430-7383" | ||
| LLM_SBSA_DOCKER_IMAGE_12_9 = "urm.nvidia.com/sw-tensorrt-docker/tensorrt-llm:pytorch-25.06-py3-aarch64-ubuntu24.04-trt10.11.0.33-skip-tritondevel-202509091430-7383" | ||
|
|
||
| // Always use x86_64 image for agent | ||
| AGENT_IMAGE = env.dockerImage.replace("aarch64", "x86_64") | ||
|
|
||
|
|
@@ -37,6 +40,9 @@ def BUILD_JOBS_FOR_CONFIG = "buildJobsForConfig" | |
| @Field | ||
| def CONFIG_LINUX_X86_64_VANILLA = "linux_x86_64_Vanilla" | ||
|
|
||
| @Field | ||
| def CONFIG_LINUX_X86_64_VANILLA_CU12 = "linux_x86_64_Vanilla_CU12" | ||
|
|
||
| @Field | ||
| def CONFIG_LINUX_X86_64_SINGLE_DEVICE = "linux_x86_64_SingleDevice" | ||
|
|
||
|
|
@@ -46,6 +52,9 @@ def CONFIG_LINUX_X86_64_LLVM = "linux_x86_64_LLVM" | |
| @Field | ||
| def CONFIG_LINUX_AARCH64 = "linux_aarch64" | ||
|
|
||
| @Field | ||
| def CONFIG_LINUX_AARCH64_CU12 = "linux_aarch64_CU12" | ||
|
|
||
| @Field | ||
| def CONFIG_LINUX_AARCH64_LLVM = "linux_aarch64_LLVM" | ||
|
|
||
|
|
@@ -64,6 +73,11 @@ def BUILD_CONFIGS = [ | |
| (TARNAME) : "TensorRT-LLM.tar.gz", | ||
| (WHEEL_ARCHS): "80-real;86-real;89-real;90-real;100-real;103-real;120-real", | ||
| ], | ||
| (CONFIG_LINUX_X86_64_VANILLA_CU12) : [ | ||
| (WHEEL_EXTRA_ARGS) : "--extra-cmake-vars ENABLE_MULTI_DEVICE=1 --extra-cmake-vars WARNING_IS_ERROR=ON --extra-cmake-vars NIXL_ROOT=/opt/nvidia/nvda_nixl --micro_benchmarks", | ||
| (TARNAME) : "TensorRT-LLM-CU12.tar.gz", | ||
| (WHEEL_ARCHS): "80-real;86-real;89-real;90-real;100-real;103-real;120-real", | ||
| ], | ||
| (CONFIG_LINUX_X86_64_PYBIND) : [ | ||
| (WHEEL_EXTRA_ARGS) : "--binding_type pybind --extra-cmake-vars ENABLE_MULTI_DEVICE=1 --extra-cmake-vars WARNING_IS_ERROR=ON --extra-cmake-vars NIXL_ROOT=/opt/nvidia/nvda_nixl --micro_benchmarks", | ||
| (TARNAME) : "pybind-TensorRT-LLM.tar.gz", | ||
|
|
@@ -85,6 +99,12 @@ def BUILD_CONFIGS = [ | |
| (WHEEL_ARCHS): "90-real;100-real;103-real;120-real", | ||
| (BUILD_JOBS_FOR_CONFIG): "4", // TODO: Remove after fix the build OOM issue on SBSA | ||
| ], | ||
| (CONFIG_LINUX_AARCH64_CU12): [ | ||
| (WHEEL_EXTRA_ARGS) : "--extra-cmake-vars WARNING_IS_ERROR=ON", | ||
| (TARNAME) : "TensorRT-LLM-GH200-CU12.tar.gz", | ||
| (WHEEL_ARCHS): "90-real;100-real;103-real;120-real", | ||
| (BUILD_JOBS_FOR_CONFIG): "4", // TODO: Remove after fix the build OOM issue on SBSA | ||
| ], | ||
| (CONFIG_LINUX_AARCH64_PYBIND): [ | ||
| (WHEEL_EXTRA_ARGS) : "--binding_type pybind --extra-cmake-vars WARNING_IS_ERROR=ON --extra-cmake-vars NIXL_ROOT=/opt/nvidia/nvda_nixl", | ||
| (TARNAME) : "pybind-TensorRT-LLM-GH200.tar.gz", | ||
|
|
@@ -434,6 +454,9 @@ def runLLMBuild(pipeline, buildFlags, tarName, is_linux_x86_64) | |
| pipArgs = "" | ||
| } | ||
|
|
||
| if (tarName.contains("CU12")) { | ||
| trtllm_utils.llmExecStepWithRetry(pipeline, script: "cd ${LLM_ROOT} && sed -i '/^# .*<For CUDA 12\\.9>\$/ {s/^# //; n; s/^/# /}' requirements.txt && cat requirements.txt") | ||
| } | ||
|
Comment on lines
+457
to
+459
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Keep the CUDA 12.9 marker commented and un-comment only the dependency This sed now strips the leading “# ” from the - trtllm_utils.llmExecStepWithRetry(pipeline, script: "cd ${LLM_ROOT} && sed -i '/^# .*<For CUDA 12\\.9>\$/ {s/^# //; n; s/^/# /}' requirements.txt && cat requirements.txt")
+ trtllm_utils.llmExecStepWithRetry(pipeline, script: "cd ${LLM_ROOT} && sed -i '/^# .*<For CUDA 12\\.9>\$/ {n; s/^# //}' requirements.txt && cat requirements.txt")🤖 Prompt for AI Agents |
||
| // install python package | ||
| trtllm_utils.llmExecStepWithRetry(pipeline, script: "cd ${LLM_ROOT} && pip3 install -r requirements-dev.txt ${pipArgs}") | ||
|
|
||
|
|
@@ -454,7 +477,10 @@ def runLLMBuild(pipeline, buildFlags, tarName, is_linux_x86_64) | |
| def llmPath = sh (script: "realpath ${LLM_ROOT}",returnStdout: true).trim() | ||
| // TODO: Remove after the cmake version is upgraded to 3.31.8 | ||
| // Get triton tag from docker/dockerfile.multi | ||
| def tritonShortTag = "r25.09" | ||
| def tritonShortTag = "r25.08" | ||
| if (tarName.contains("CU12")) { | ||
| tritonShortTag = "r25.06" | ||
| } | ||
| sh "cd ${LLM_ROOT}/triton_backend/inflight_batcher_llm && mkdir build && cd build && cmake .. -DTRTLLM_DIR=${llmPath} -DTRITON_COMMON_REPO_TAG=${tritonShortTag} -DTRITON_CORE_REPO_TAG=${tritonShortTag} -DTRITON_THIRD_PARTY_REPO_TAG=${tritonShortTag} -DTRITON_BACKEND_REPO_TAG=${tritonShortTag} -DUSE_CXX11_ABI=ON && make -j${buildJobs} install" | ||
|
|
||
| // Step 3: packaging wheels into tarfile | ||
|
|
@@ -544,9 +570,14 @@ def launchStages(pipeline, cpu_arch, enableFailFast, globalVars) | |
| wheelDockerImage = env.dockerImage | ||
| } | ||
|
|
||
| def LLM_DOCKER_IMAGE_CU12 = cpu_arch == AARCH64_TRIPLE ? LLM_SBSA_DOCKER_IMAGE_12_9 : LLM_DOCKER_IMAGE_12_9 | ||
|
|
||
| buildConfigs = [ | ||
| "Build TRT-LLM": [LLM_DOCKER_IMAGE] + prepareLLMBuild( | ||
| pipeline, cpu_arch == AARCH64_TRIPLE ? CONFIG_LINUX_AARCH64 : CONFIG_LINUX_X86_64_VANILLA), | ||
| // Disable CUDA12 build for too slow to build (cost > 5 hours on SBSA) | ||
| "Build TRT-LLM CUDA12": [LLM_DOCKER_IMAGE_CU12] + prepareLLMBuild( | ||
| pipeline, cpu_arch == AARCH64_TRIPLE ? CONFIG_LINUX_AARCH64_CU12 : CONFIG_LINUX_X86_64_VANILLA_CU12), | ||
| "Build TRT-LLM LLVM": [LLM_DOCKER_IMAGE] + prepareLLMBuild( | ||
| pipeline, cpu_arch == AARCH64_TRIPLE ? CONFIG_LINUX_AARCH64_LLVM : CONFIG_LINUX_X86_64_LLVM), | ||
| "Build TRT-LLM Pybind": [LLM_DOCKER_IMAGE] + prepareLLMBuild( | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Avoid hardcoded version strings in PyTorch-Triton renaming.
The PyTorch-Triton package renaming logic contains hardcoded version strings (
pytorch_triton-3.3.1+gitc8757738andtriton-3.3.1+gitc8757738) that will break when versions change.Consider making this more maintainable:
RUN if [ -f /etc/redhat-release ]; then \ echo "Rocky8 detected, skipping symlink and ldconfig steps"; \ else \ cd /usr/local/lib/python3.12/dist-packages/ && \ - ls -la | grep pytorch_triton && \ - mv pytorch_triton-3.3.1+gitc8757738.dist-info triton-3.3.1+gitc8757738.dist-info && \ - cd triton-3.3.1+gitc8757738.dist-info && \ + PYTORCH_TRITON_DIR=$(ls -d pytorch_triton-*.dist-info | head -n 1) && \ + TRITON_DIR=$(echo "$PYTORCH_TRITON_DIR" | sed 's/pytorch_triton/triton/') && \ + mv "$PYTORCH_TRITON_DIR" "$TRITON_DIR" && \ + cd "$TRITON_DIR" && \ echo "Current directory: $(pwd)" && \ echo "Files in directory:" && \ ls -la && \ sed -i 's/^Name: pytorch-triton/Name: triton/' METADATA && \ - sed -i 's|pytorch_triton-3.3.1+gitc8757738.dist-info/|triton-3.3.1+gitc8757738.dist-info/|g' RECORD && \ + sed -i "s|$PYTORCH_TRITON_DIR/|$TRITON_DIR/|g" RECORD && \ echo "METADATA after update:" && \ grep "^Name:" METADATA; \ fi📝 Committable suggestion
🤖 Prompt for AI Agents