Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion constraints.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,5 @@
# These vulnerabilities were inherited from the base image (pytorch:25.10-py3) and should be removed when the base image
# These vulnerabilities were inherited from the base image (pytorch:25.06-py3) and should be removed when the base image
# is updated.

# WAR against https://github.com/advisories/GHSA-8qvm-5x2c-j2w7
protobuf>=4.25.8
34 changes: 27 additions & 7 deletions docker/Dockerfile.multi
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
# Multi-stage Dockerfile
ARG BASE_IMAGE=nvcr.io/nvidia/pytorch
ARG TRITON_IMAGE=nvcr.io/nvidia/tritonserver
ARG BASE_TAG=25.10-py3
# [TODO] Update to NVIDIA Triton 25.10 when it's available
ARG TRITON_BASE_TAG=25.09-py3
ARG BASE_TAG=25.08-py3
ARG TRITON_BASE_TAG=25.08-py3
ARG DEVEL_IMAGE=devel

FROM ${BASE_IMAGE}:${BASE_TAG} AS base
Expand Down Expand Up @@ -41,9 +40,6 @@ COPY docker/common/install.sh \
docker/common/install_polygraphy.sh \
docker/common/install_mpi4py.sh \
docker/common/install_pytorch.sh \
docker/common/install_ucx.sh \
docker/common/install_nixl.sh \
docker/common/install_etcd.sh \
./

RUN GITHUB_MIRROR=${GITHUB_MIRROR} \
Expand Down Expand Up @@ -75,15 +71,36 @@ RUN GITHUB_MIRROR=${GITHUB_MIRROR} bash ./install.sh --mpi4py && rm install_mpi4
ARG TORCH_INSTALL_TYPE="skip"
RUN TORCH_INSTALL_TYPE=${TORCH_INSTALL_TYPE} bash ./install.sh --pytorch && rm install_pytorch.sh

RUN bash ./install.sh --opencv && rm install.sh
RUN bash ./install.sh --opencv && bash ./install.sh --protobuf && rm install.sh

# wait for new triton to be published
# Rename pytorch_triton package to triton
RUN if [ -f /etc/redhat-release ]; then \
echo "Rocky8 detected, skipping symlink and ldconfig steps"; \
else \
cd /usr/local/lib/python3.12/dist-packages/ && \
ls -la | grep pytorch_triton && \
mv pytorch_triton-3.3.1+gitc8757738.dist-info triton-3.3.1+gitc8757738.dist-info && \
cd triton-3.3.1+gitc8757738.dist-info && \
echo "Current directory: $(pwd)" && \
echo "Files in directory:" && \
ls -la && \
sed -i 's/^Name: pytorch-triton/Name: triton/' METADATA && \
sed -i 's|pytorch_triton-3.3.1+gitc8757738.dist-info/|triton-3.3.1+gitc8757738.dist-info/|g' RECORD && \
echo "METADATA after update:" && \
grep "^Name:" METADATA; \
fi
Comment on lines +76 to +92
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Avoid hardcoded version strings in PyTorch-Triton renaming.

The PyTorch-Triton package renaming logic contains hardcoded version strings (pytorch_triton-3.3.1+gitc8757738 and triton-3.3.1+gitc8757738) that will break when versions change.

Consider making this more maintainable:

 RUN if [ -f /etc/redhat-release ]; then \
         echo "Rocky8 detected, skipping symlink and ldconfig steps"; \
     else \
         cd /usr/local/lib/python3.12/dist-packages/ && \
-        ls -la | grep pytorch_triton && \
-        mv pytorch_triton-3.3.1+gitc8757738.dist-info triton-3.3.1+gitc8757738.dist-info && \
-        cd triton-3.3.1+gitc8757738.dist-info && \
+        PYTORCH_TRITON_DIR=$(ls -d pytorch_triton-*.dist-info | head -n 1) && \
+        TRITON_DIR=$(echo "$PYTORCH_TRITON_DIR" | sed 's/pytorch_triton/triton/') && \
+        mv "$PYTORCH_TRITON_DIR" "$TRITON_DIR" && \
+        cd "$TRITON_DIR" && \
         echo "Current directory: $(pwd)" && \
         echo "Files in directory:" && \
         ls -la && \
         sed -i 's/^Name: pytorch-triton/Name: triton/' METADATA && \
-        sed -i 's|pytorch_triton-3.3.1+gitc8757738.dist-info/|triton-3.3.1+gitc8757738.dist-info/|g' RECORD && \
+        sed -i "s|$PYTORCH_TRITON_DIR/|$TRITON_DIR/|g" RECORD && \
         echo "METADATA after update:" && \
         grep "^Name:" METADATA; \
     fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# wait for new triton to be published
# Rename pytorch_triton package to triton
RUN if [ -f /etc/redhat-release ]; then \
echo "Rocky8 detected, skipping symlink and ldconfig steps"; \
else \
cd /usr/local/lib/python3.12/dist-packages/ && \
ls -la | grep pytorch_triton && \
mv pytorch_triton-3.3.1+gitc8757738.dist-info triton-3.3.1+gitc8757738.dist-info && \
cd triton-3.3.1+gitc8757738.dist-info && \
echo "Current directory: $(pwd)" && \
echo "Files in directory:" && \
ls -la && \
sed -i 's/^Name: pytorch-triton/Name: triton/' METADATA && \
sed -i 's|pytorch_triton-3.3.1+gitc8757738.dist-info/|triton-3.3.1+gitc8757738.dist-info/|g' RECORD && \
echo "METADATA after update:" && \
grep "^Name:" METADATA; \
fi
# wait for new triton to be published
# Rename pytorch_triton package to triton
RUN if [ -f /etc/redhat-release ]; then \
echo "Rocky8 detected, skipping symlink and ldconfig steps"; \
else \
cd /usr/local/lib/python3.12/dist-packages/ && \
PYTORCH_TRITON_DIR=$(ls -d pytorch_triton-*.dist-info | head -n 1) && \
TRITON_DIR=$(echo "$PYTORCH_TRITON_DIR" | sed 's/pytorch_triton/triton/') && \
mv "$PYTORCH_TRITON_DIR" "$TRITON_DIR" && \
cd "$TRITON_DIR" && \
echo "Current directory: $(pwd)" && \
echo "Files in directory:" && \
ls -la && \
sed -i 's/^Name: pytorch-triton/Name: triton/' METADATA && \
sed -i "s|$PYTORCH_TRITON_DIR/|$TRITON_DIR/|g" RECORD && \
echo "METADATA after update:" && \
grep "^Name:" METADATA; \
fi
🤖 Prompt for AI Agents
In docker/Dockerfile.multi around lines 76 to 92, the renaming logic uses
hardcoded package version strings which will break when the triton package
version changes; instead, detect the actual dist-info directory name at runtime
(e.g., using a glob like pytorch_triton-*.dist-info), extract the version/name
components into variables, and perform mv, cd, and sed replacements using those
variables (update METADATA and RECORD using the discovered names rather than
literal strings); ensure the shell remains POSIX-safe in the Dockerfile RUN
(handle no-match cases robustly and keep the existing Rocky8 conditional).


# Install UCX first
COPY docker/common/install_ucx.sh install_ucx.sh
RUN GITHUB_MIRROR=${GITHUB_MIRROR} bash ./install_ucx.sh && rm install_ucx.sh

# Install NIXL
COPY docker/common/install_nixl.sh install_nixl.sh
RUN GITHUB_MIRROR=${GITHUB_MIRROR} bash ./install_nixl.sh && rm install_nixl.sh

# Install etcd
COPY docker/common/install_etcd.sh install_etcd.sh
RUN bash ./install_etcd.sh && rm install_etcd.sh

FROM ${TRITON_IMAGE}:${TRITON_BASE_TAG} AS triton
Expand All @@ -99,6 +116,9 @@ COPY --from=triton /opt/tritonserver/caches /opt/tritonserver/caches

# Copy all installation scripts at once to reduce layers
COPY docker/common/install_triton.sh \
docker/common/install_ucx.sh \
docker/common/install_nixl.sh \
docker/common/install_etcd.sh \
./

RUN GITHUB_MIRROR=${GITHUB_MIRROR} bash ./install_triton.sh && rm install_triton.sh
Expand Down
7 changes: 3 additions & 4 deletions docker/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -192,17 +192,16 @@ jenkins-rockylinux8_%: PYTHON_VERSION_TAG_ID = $(if $(findstring 3.12,${PYTHON_V
jenkins-rockylinux8_%: IMAGE_WITH_TAG = $(shell . ../jenkins/current_image_tags.properties && echo $$LLM_ROCKYLINUX8_${PYTHON_VERSION_TAG_ID}_DOCKER_IMAGE)
jenkins-rockylinux8_%: STAGE = tritondevel
jenkins-rockylinux8_%: BASE_IMAGE = nvcr.io/nvidia/cuda
# [TODO] Update to NVIDIA CUDA 13.0.2 when it's available
jenkins-rockylinux8_%: BASE_TAG = 13.0.1-devel-rockylinux8
jenkins-rockylinux8_%: BASE_TAG = 13.0.0-devel-rockylinux8

rockylinux8_%: STAGE = tritondevel
rockylinux8_%: BASE_IMAGE = nvcr.io/nvidia/cuda
rockylinux8_%: BASE_TAG = 13.0.1-devel-rockylinux8
rockylinux8_%: BASE_TAG = 13.0.0-devel-rockylinux8

# For x86_64 and aarch64
ubuntu22_%: STAGE = tritondevel
ubuntu22_%: BASE_IMAGE = nvcr.io/nvidia/cuda
ubuntu22_%: BASE_TAG = 13.0.1-devel-ubuntu22.04
ubuntu22_%: BASE_TAG = 13.0.0-devel-ubuntu22.04

trtllm_%: STAGE = release
trtllm_%: PUSH_TO_STAGING := 0
Expand Down
13 changes: 13 additions & 0 deletions docker/common/install.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ polygraphy=0
mpi4py=0
pytorch=0
opencv=0
protobuf=0

while [[ $# -gt 0 ]]; do
case $1 in
Expand Down Expand Up @@ -55,6 +56,10 @@ while [[ $# -gt 0 ]]; do
opencv=1
shift 1
;;
--protobuf)
protobuf=1
shift 1
;;
--all)
base=1
cmake=1
Expand All @@ -65,6 +70,7 @@ while [[ $# -gt 0 ]]; do
mpi4py=1
pytorch=1
opencv=1
protobuf=1
shift 1
;;
*)
Expand Down Expand Up @@ -129,3 +135,10 @@ if [ $opencv -eq 1 ]; then
rm -rf /usr/local/lib/python3*/dist-packages/cv2/
pip3 install opencv-python-headless --force-reinstall --no-deps --no-cache-dir
fi

# WARs against security issues inherited from pytorch:25.06
# * https://github.com/advisories/GHSA-8qvm-5x2c-j2w7
if [ $protobuf -eq 1 ]; then
pip3 install --upgrade --no-cache-dir \
"protobuf>=4.25.8"
fi
2 changes: 1 addition & 1 deletion docker/common/install_cuda_toolkit.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ set -ex
# This script is used for reinstalling CUDA on Rocky Linux 8 with the run file.
# CUDA version is usually aligned with the latest NGC CUDA image tag.
# Only use when public CUDA image is not ready.
CUDA_VER="13.0.2_580.95.05"
CUDA_VER="13.0.0_580.65.06"
CUDA_VER_SHORT="${CUDA_VER%_*}"

NVCC_VERSION_OUTPUT=$(nvcc --version)
Expand Down
11 changes: 2 additions & 9 deletions docker/common/install_mpi4py.sh
Original file line number Diff line number Diff line change
Expand Up @@ -27,15 +27,12 @@ diff --git a/src/mpi4py/futures/_lib.py b/src/mpi4py/futures/_lib.py
index f14934d1..eebfb8fc 100644
--- a/src/mpi4py/futures/_lib.py
+++ b/src/mpi4py/futures/_lib.py
@@ -278,6 +278,43 @@ def _manager_comm(pool, options, comm, full=True):
@@ -278,6 +278,40 @@ def _manager_comm(pool, options, comm, full=True):


def _manager_split(pool, options, comm, root):
+ if(os.getenv("TRTLLM_USE_MPI_KVCACHE")=="1"):
+ try:
+ from cuda.bindings import runtime as cudart
+ except ImportError:
+ from cuda import cudart
+ from cuda import cudart
+ has_slurm_rank=False
+ has_ompi_rank=False
+ slurm_rank=0
Expand Down Expand Up @@ -74,10 +71,6 @@ index f14934d1..eebfb8fc 100644
EOF

# Install with pip and clean up cache
ARCH=$(uname -m)
if [ "$ARCH" = "aarch64" ]; then
pip3 install --no-cache-dir Cython==0.29.37
fi
pip3 install --no-cache-dir "$TMP_DIR/mpi4py-${MPI4PY_VERSION}"

# Clean up
Expand Down
8 changes: 4 additions & 4 deletions docker/common/install_pytorch.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@ set -ex

# Use latest stable version from https://pypi.org/project/torch/#history
# and closest to the version specified in
# https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-25-10.html#rel-25-10
TORCH_VERSION="2.9.0"
# https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-25-08.html#rel-25-08
TORCH_VERSION="2.8.0"
SYSTEM_ID=$(grep -oP '(?<=^ID=).+' /etc/os-release | tr -d '"')

prepare_environment() {
Expand Down Expand Up @@ -69,8 +69,8 @@ install_from_pypi() {
if [ "$ARCH" = "amd64" ];then ARCH="x86_64";fi
if [ "$ARCH" = "aarch64" ];then ARCH="sbsa";fi

pip3 uninstall -y torch torchvision
pip3 install torch==${TORCH_VERSION} torchvision --index-url https://download.pytorch.org/whl/cu130
pip3 uninstall -y torch torchvision torchaudio
pip3 install torch==${TORCH_VERSION} torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
}

case "$1" in
Expand Down
19 changes: 11 additions & 8 deletions docker/common/install_tensorrt.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,20 +2,23 @@

set -ex

TRT_VER="10.13.3.9"
TRT_VER="10.13.2.6"
# Align with the pre-installed cuDNN / cuBLAS / NCCL versions from
# https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-25-10.html#rel-25-10
CUDA_VER="13.0" # 13.0.2
# https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-25-08.html#rel-25-08
CUDA_VER="13.0" # 13.0.0
# Keep the installation for cuDNN if users want to install PyTorch with source codes.
# PyTorch 2.x can compile with cuDNN v9.
CUDNN_VER="9.14.0.64-1"
CUDNN_VER="9.12.0.46-1"
# NCCL version 2.26.x used in the NGC PyTorch 25.05 image but has a performance regression issue.
# Use NCCL version 2.27.5 which has the fixes.
NCCL_VER="2.27.7-1+cuda13.0"
CUBLAS_VER="13.1.0.3-1"
# Use cuBLAS version 13.0.0.19 instead.
CUBLAS_VER="13.0.0.19-1"
# Align with the pre-installed CUDA / NVCC / NVRTC versions from
# https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html
NVRTC_VER="13.0.88-1"
CUDA_RUNTIME="13.0.96-1"
CUDA_DRIVER_VERSION="580.95.05-1.el8"
NVRTC_VER="13.0.48-1"
CUDA_RUNTIME="13.0.48-1"
CUDA_DRIVER_VERSION="580.65.06-1.el8"

for i in "$@"; do
case $i in
Expand Down
5 changes: 5 additions & 0 deletions docs/source/installation/build-from-source-linux.md
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,11 @@ check <https://github.com/NVIDIA/TensorRT-LLM/tree/main/docker>.

## Build TensorRT LLM

```{tip}
:name: build-from-source-tip-cuda-version
TensorRT LLM 1.1 supports both CUDA 12.9 and 13.0 while some dependency changes are required. The `requirements.txt` contains dependencies needed by CUDA 13.0. If you are using CUDA 12.9, please uncomment lines end with `# <For CUDA 12.9>` and comment out the next lines.
```
Comment on lines +150 to +153
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Documentation inconsistent with automated CUDA version handling.

The documentation instructs users to manually uncomment/comment lines in requirements.txt for CUDA version selection. However, scripts/build_wheel.py (lines 950-981) now automatically modifies requirements.txt based on the CUDA_VERSION environment variable.

This creates confusion:

  • Should users manually edit the file?
  • Should they set the CUDA_VERSION environment variable instead?
  • If they do both, which takes precedence?

Update the documentation to clarify the recommended approach:

 ```{tip}
 :name: build-from-source-tip-cuda-version
-TensorRT LLM 1.1 supports both CUDA 12.9 and 13.0 while some dependency changes are required. The `requirements.txt` contains dependencies needed by CUDA 13.0. If you are using CUDA 12.9, please uncomment lines end with `# <For CUDA 12.9>` and comment out the next lines.
+TensorRT LLM 1.1 supports both CUDA 12.9 and 13.0 while some dependency changes are required. The `requirements.txt` contains dependencies for CUDA 13.0 by default. If you are using CUDA 12.9, the build script will automatically detect your CUDA version from the `CUDA_VERSION` environment variable and adjust dependencies accordingly. Alternatively, you can manually uncomment lines ending with `# <For CUDA 12.9>` and comment out the following lines before building.

<details>
<summary>🤖 Prompt for AI Agents</summary>

In docs/source/installation/build-from-source-linux.md around lines 150 to 153,
the doc currently tells users to manually edit requirements.txt for CUDA 12.9
but the build script (scripts/build_wheel.py lines 950-981) automatically
adjusts requirements based on the CUDA_VERSION env var; update the tip to state
that requirements.txt defaults to CUDA 13.0, the build script will automatically
detect and switch dependencies when CUDA_VERSION is set to 12.9, and also note
users may alternatively perform the manual uncomment/comment edits if they
prefer—clarify precedence (environment variable/build script takes precedence
over manual edits during automated build) and give a concise recommended action:
set CUDA_VERSION for automated builds or edit requirements.txt for manual
control.


</details>

<!-- This is an auto-generated comment by CodeRabbit -->


### Option 1: Full Build with C++ Compilation

The following command compiles the C++ code and packages the compiled libraries along with the Python files into a wheel. When developing C++ code, you need this full build command to apply your code changes.
Expand Down
11 changes: 9 additions & 2 deletions docs/source/installation/linux.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,16 +12,23 @@
Install CUDA Toolkit following the [CUDA Installation Guide for Linux](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/) and
make sure `CUDA_HOME` environment variable is properly set.

```{tip}
:name: installation-linux-tip-cuda-version
TensorRT LLM 1.1 supports both CUDA 12.9 and 13.0. The wheel package release only supports CUDA 12.9, while CUDA 13.0 is only supported through NGC container release.
```

```bash
# By default, PyTorch CUDA 12.8 package is installed. Install PyTorch CUDA 13.0 package to align with the CUDA version used for building TensorRT LLM wheels.
pip3 install torch==2.9.0 torchvision --index-url https://download.pytorch.org/whl/cu130
# Optional step: Only required for NVIDIA Blackwell GPUs and SBSA platform
pip3 install torch==2.7.1 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
Comment on lines +21 to +22
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Clarify PyTorch and CUDA version references.

The documentation mentions:

  • Line 22: Installing torch==2.7.1 with cu128 wheels
  • Line 17: TensorRT LLM 1.1 supports CUDA 12.9 and 13.0
  • Line 30: PyTorch CUDA 12.8 package is required

Please clarify:

  1. The relationship between cu128 (CUDA 12.8) and the CUDA 12.9 support mentioned in the tip
  2. Whether PyTorch 2.7.1 is the correct version (the AI summary mentions PyTorch 2.8.0)

🌐 Web query:

What is the latest PyTorch version that supports CUDA 12.8 (cu128)?

💡 Result:

The latest stable PyTorch release with official CUDA 12.8 (cu128) binaries is PyTorch 2.7 (2.7.x). [1][2][3]

Sources:

  • Google Cloud Deep Learning VM images (lists PyTorch 2.7.1 with CUDA 12.8). [1]
  • NVIDIA PyTorch release notes (container based on 2.7.0 and CUDA 12.8). [2]
  • PyTorch GitHub / issue discussion showing 2.7+cu128 builds. [3]





🌐 Web query:

TensorRT-LLM 1.1 CUDA version support requirements

💡 Result:

TensorRT‑LLM 1.1 supports CUDA 12.9 and CUDA 13.0. The prebuilt Python wheel release only supports CUDA 12.9; CUDA 13.0 is supported via the NGC container or when building from source with the CUDA‑13 dependencies. [1][2]

Sources:

  • NVIDIA TensorRT‑LLM installation (pip) — note about CUDA 12.9 vs 13.0. [1]
  • TensorRT‑LLM build-from-source — instructions/notes for supporting CUDA 12.9 and 13.0. [2]

🌐 Web query:

TensorRT-LLM 1.1 CUDA version support

💡 Result:

TensorRT-LLM 1.1 supports CUDA 12.9 and CUDA 13.0. The prebuilt pip wheel release only supports CUDA 12.9; CUDA 13.0 is supported via the NGC container or when building from source with the CUDA‑13 dependencies. [1][2]

Sources:
[1] NVIDIA TensorRT‑LLM installation (pip) — notes on CUDA 12.9 and 13.0 support.
[2] NVIDIA TensorRT‑LLM build-from-source — instructions and dependency differences for CUDA 12.9 vs 13.0.


CUDA version mismatch: PyTorch wheels use CUDA 12.8, but TensorRT-LLM 1.1 requires CUDA 12.9.

The documentation instructs installing torch==2.7.1 with cu128 wheels (CUDA 12.8), but TensorRT-LLM 1.1 only supports CUDA 12.9 and 13.0. This incompatibility needs to be resolved:

  • Use cu129 wheels (CUDA 12.9) instead of cu128, or
  • Update the documentation to clarify which CUDA versions are compatible with TensorRT-LLM 1.1
🤖 Prompt for AI Agents
In docs/source/installation/linux.md around lines 21 to 22, the guide installs
torch==2.7.1 using cu128 (CUDA 12.8) which is incompatible with TensorRT-LLM 1.1
that requires CUDA 12.9 or 13.0; change the pip install to use the cu129 wheels
(or explicitly document supported CUDA versions) by replacing the index-url or
wheel tag to cu129 and add a short note stating TensorRT-LLM 1.1 requires CUDA
12.9/13.0 so users must match PyTorch CUDA builds accordingly.


sudo apt-get -y install libopenmpi-dev

# Optional step: Only required for disagg-serving
sudo apt-get -y install libzmq3-dev
```

PyTorch CUDA 12.8 package is required for supporting NVIDIA Blackwell GPUs and SBSA platform. On prior GPUs or Linux x86_64 platform, this extra installation is not required.

```{tip}
Instead of manually installing the preqrequisites as described
above, it is also possible to use the pre-built [TensorRT LLM Develop container
Expand Down
2 changes: 1 addition & 1 deletion docs/source/legacy/reference/support-matrix.md
Original file line number Diff line number Diff line change
Expand Up @@ -152,7 +152,7 @@ The following table shows the supported software for TensorRT-LLM.
* -
- Software Compatibility
* - Container
- [25.10](https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html)
- [25.08](https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html)
* - TensorRT
- [10.13](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/index.html)
* - Precision
Expand Down
33 changes: 32 additions & 1 deletion jenkins/Build.groovy
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,9 @@ AARCH64_TRIPLE = "aarch64-linux-gnu"

LLM_DOCKER_IMAGE = env.dockerImage

LLM_DOCKER_IMAGE_12_9 = "urm.nvidia.com/sw-tensorrt-docker/tensorrt-llm:pytorch-25.06-py3-x86_64-ubuntu24.04-trt10.11.0.33-skip-tritondevel-202509091430-7383"
LLM_SBSA_DOCKER_IMAGE_12_9 = "urm.nvidia.com/sw-tensorrt-docker/tensorrt-llm:pytorch-25.06-py3-aarch64-ubuntu24.04-trt10.11.0.33-skip-tritondevel-202509091430-7383"

// Always use x86_64 image for agent
AGENT_IMAGE = env.dockerImage.replace("aarch64", "x86_64")

Expand All @@ -37,6 +40,9 @@ def BUILD_JOBS_FOR_CONFIG = "buildJobsForConfig"
@Field
def CONFIG_LINUX_X86_64_VANILLA = "linux_x86_64_Vanilla"

@Field
def CONFIG_LINUX_X86_64_VANILLA_CU12 = "linux_x86_64_Vanilla_CU12"

@Field
def CONFIG_LINUX_X86_64_SINGLE_DEVICE = "linux_x86_64_SingleDevice"

Expand All @@ -46,6 +52,9 @@ def CONFIG_LINUX_X86_64_LLVM = "linux_x86_64_LLVM"
@Field
def CONFIG_LINUX_AARCH64 = "linux_aarch64"

@Field
def CONFIG_LINUX_AARCH64_CU12 = "linux_aarch64_CU12"

@Field
def CONFIG_LINUX_AARCH64_LLVM = "linux_aarch64_LLVM"

Expand All @@ -64,6 +73,11 @@ def BUILD_CONFIGS = [
(TARNAME) : "TensorRT-LLM.tar.gz",
(WHEEL_ARCHS): "80-real;86-real;89-real;90-real;100-real;103-real;120-real",
],
(CONFIG_LINUX_X86_64_VANILLA_CU12) : [
(WHEEL_EXTRA_ARGS) : "--extra-cmake-vars ENABLE_MULTI_DEVICE=1 --extra-cmake-vars WARNING_IS_ERROR=ON --extra-cmake-vars NIXL_ROOT=/opt/nvidia/nvda_nixl --micro_benchmarks",
(TARNAME) : "TensorRT-LLM-CU12.tar.gz",
(WHEEL_ARCHS): "80-real;86-real;89-real;90-real;100-real;103-real;120-real",
],
(CONFIG_LINUX_X86_64_PYBIND) : [
(WHEEL_EXTRA_ARGS) : "--binding_type pybind --extra-cmake-vars ENABLE_MULTI_DEVICE=1 --extra-cmake-vars WARNING_IS_ERROR=ON --extra-cmake-vars NIXL_ROOT=/opt/nvidia/nvda_nixl --micro_benchmarks",
(TARNAME) : "pybind-TensorRT-LLM.tar.gz",
Expand All @@ -85,6 +99,12 @@ def BUILD_CONFIGS = [
(WHEEL_ARCHS): "90-real;100-real;103-real;120-real",
(BUILD_JOBS_FOR_CONFIG): "4", // TODO: Remove after fix the build OOM issue on SBSA
],
(CONFIG_LINUX_AARCH64_CU12): [
(WHEEL_EXTRA_ARGS) : "--extra-cmake-vars WARNING_IS_ERROR=ON",
(TARNAME) : "TensorRT-LLM-GH200-CU12.tar.gz",
(WHEEL_ARCHS): "90-real;100-real;103-real;120-real",
(BUILD_JOBS_FOR_CONFIG): "4", // TODO: Remove after fix the build OOM issue on SBSA
],
(CONFIG_LINUX_AARCH64_PYBIND): [
(WHEEL_EXTRA_ARGS) : "--binding_type pybind --extra-cmake-vars WARNING_IS_ERROR=ON --extra-cmake-vars NIXL_ROOT=/opt/nvidia/nvda_nixl",
(TARNAME) : "pybind-TensorRT-LLM-GH200.tar.gz",
Expand Down Expand Up @@ -434,6 +454,9 @@ def runLLMBuild(pipeline, buildFlags, tarName, is_linux_x86_64)
pipArgs = ""
}

if (tarName.contains("CU12")) {
trtllm_utils.llmExecStepWithRetry(pipeline, script: "cd ${LLM_ROOT} && sed -i '/^# .*<For CUDA 12\\.9>\$/ {s/^# //; n; s/^/# /}' requirements.txt && cat requirements.txt")
}
Comment on lines +457 to +459
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Keep the CUDA 12.9 marker commented and un-comment only the dependency

This sed now strips the leading “# ” from the <For CUDA 12.9> marker and then re-comments the actual requirement line. During CU12 builds/tests pip3 install -r requirements*.txt encounters the bare <For CUDA 12.9> token and aborts with “Invalid requirement”, so the entire job fails. Please leave the marker commented and only un-comment the dependency line itself.

-        trtllm_utils.llmExecStepWithRetry(pipeline, script: "cd ${LLM_ROOT} && sed -i '/^# .*<For CUDA 12\\.9>\$/ {s/^# //; n; s/^/# /}' requirements.txt && cat requirements.txt")
+        trtllm_utils.llmExecStepWithRetry(pipeline, script: "cd ${LLM_ROOT} && sed -i '/^# .*<For CUDA 12\\.9>\$/ {n; s/^# //}' requirements.txt && cat requirements.txt")
🤖 Prompt for AI Agents
jenkins/Build.groovy lines 457-459: the sed currently un-comments the "<For CUDA
12.9>" marker and re-comments the dependency, causing pip to see the bare marker
and fail; change the sed invocation so it does NOT modify the marker line but
instead, when it finds a commented "<For CUDA 12.9>" marker, moves to the next
line and removes the leading "# " only from that following dependency line
(leave the marker commented), then verify by printing requirements.txt.

// install python package
trtllm_utils.llmExecStepWithRetry(pipeline, script: "cd ${LLM_ROOT} && pip3 install -r requirements-dev.txt ${pipArgs}")

Expand All @@ -454,7 +477,10 @@ def runLLMBuild(pipeline, buildFlags, tarName, is_linux_x86_64)
def llmPath = sh (script: "realpath ${LLM_ROOT}",returnStdout: true).trim()
// TODO: Remove after the cmake version is upgraded to 3.31.8
// Get triton tag from docker/dockerfile.multi
def tritonShortTag = "r25.09"
def tritonShortTag = "r25.08"
if (tarName.contains("CU12")) {
tritonShortTag = "r25.06"
}
sh "cd ${LLM_ROOT}/triton_backend/inflight_batcher_llm && mkdir build && cd build && cmake .. -DTRTLLM_DIR=${llmPath} -DTRITON_COMMON_REPO_TAG=${tritonShortTag} -DTRITON_CORE_REPO_TAG=${tritonShortTag} -DTRITON_THIRD_PARTY_REPO_TAG=${tritonShortTag} -DTRITON_BACKEND_REPO_TAG=${tritonShortTag} -DUSE_CXX11_ABI=ON && make -j${buildJobs} install"

// Step 3: packaging wheels into tarfile
Expand Down Expand Up @@ -544,9 +570,14 @@ def launchStages(pipeline, cpu_arch, enableFailFast, globalVars)
wheelDockerImage = env.dockerImage
}

def LLM_DOCKER_IMAGE_CU12 = cpu_arch == AARCH64_TRIPLE ? LLM_SBSA_DOCKER_IMAGE_12_9 : LLM_DOCKER_IMAGE_12_9

buildConfigs = [
"Build TRT-LLM": [LLM_DOCKER_IMAGE] + prepareLLMBuild(
pipeline, cpu_arch == AARCH64_TRIPLE ? CONFIG_LINUX_AARCH64 : CONFIG_LINUX_X86_64_VANILLA),
// Disable CUDA12 build for too slow to build (cost > 5 hours on SBSA)
"Build TRT-LLM CUDA12": [LLM_DOCKER_IMAGE_CU12] + prepareLLMBuild(
pipeline, cpu_arch == AARCH64_TRIPLE ? CONFIG_LINUX_AARCH64_CU12 : CONFIG_LINUX_X86_64_VANILLA_CU12),
"Build TRT-LLM LLVM": [LLM_DOCKER_IMAGE] + prepareLLMBuild(
pipeline, cpu_arch == AARCH64_TRIPLE ? CONFIG_LINUX_AARCH64_LLVM : CONFIG_LINUX_X86_64_LLVM),
"Build TRT-LLM Pybind": [LLM_DOCKER_IMAGE] + prepareLLMBuild(
Expand Down
Loading
Loading