Skip to content

Commit 7b30a36

Browse files
tscholakclaude
andcommitted
Bump base image and dependencies for KDA support
Update to nvcr.io/nvidia/pytorch:25.11-py3 which includes: - PyTorch 2.10 - CUDA 13.0 - flash-attn 2.7.4.post1 (pre-installed, no compilation needed) Dependency updates: - causal-conv1d: v1.5.4 (was pinned to commit 2a288a1) - mamba-ssm: 2.2.6.post3 (was pinned to commit 4a8a2a2) - flash-linear-attention: pin to commit 67eee20 (was @main) - flash-attn: 2.7.4.post1 to match base image (was 2.7.3) - triton: 3.5.1 in Dockerfile (was 3.1.0) These updates enable Kimi Delta Attention (KDA) support via the flash-linear-attention library. The pinned versions are tested and working, unlike the nightly/unpinned approach in #395. Note: Dropless MoE kernel remains broken with triton >= 3.2.0 and needs a complete rewrite (also limited to 32 experts). This is tracked separately and doesn't block KDA work. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
1 parent cc009a4 commit 7b30a36

File tree

2 files changed

+11
-10
lines changed

2 files changed

+11
-10
lines changed

Dockerfile

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# syntax=docker/dockerfile:1.7-labs
2-
FROM nvcr.io/nvidia/pytorch:25.05-py3
2+
FROM nvcr.io/nvidia/pytorch:25.11-py3
33

44
# Install dependencies.
55
RUN apt-get update \
@@ -29,16 +29,17 @@ ENV PIP_CONSTRAINT=""
2929
# There is no pre-build mamba image for pytorch 2.8, we build it before the rest to avoid rebuilds.
3030
# We need to compile from the repo because of https://github.com/state-spaces/mamba/issues/720 (same for causal-conv1d)
3131
# We set the number of workers to avoid OOM when compiling on laptop. (TODO: Can we make it configurable?)
32-
RUN MAX_JOBS=2 pip install --no-build-isolation "causal-conv1d@git+https://github.com/Dao-AILab/causal-conv1d@2a288a1"
33-
RUN MAX_JOBS=2 pip install --no-build-isolation "mamba_ssm[causal-conv1d]@git+https://github.com/state-spaces/mamba@4a8a2a2"
32+
RUN MAX_JOBS=2 pip install --no-build-isolation "causal-conv1d @ git+https://github.com/Dao-AILab/causal-conv1d@v1.5.4"
33+
RUN MAX_JOBS=2 pip install --no-build-isolation mamba-ssm==2.2.6.post3
34+
RUN MAX_JOBS=2 pip install --no-build-isolation "flash-linear-attention @ git+https://github.com/fla-org/flash-linear-attention@67eee20c8503cd19eeb52aa1b99821308e9260c5"
3435
# Copy dependency files with universal write permissions for all users.
3536
COPY --chmod=777 setup.py setup.cfg pyproject.toml ./
3637
COPY --chmod=777 ./fast_llm_external_models/__init__.py fast_llm_external_models/
3738
COPY --chmod=777 ./fast_llm/__init__.py fast_llm/
3839
COPY --chmod=777 ./fast_llm/csrc/ fast_llm/csrc/
3940

4041
# Install dependencies within the virtual environment.
41-
RUN pip install --no-cache-dir --no-build-isolation -e ".[CORE,OPTIONAL,HUGGINGFACE,SSM,VISION,GENERATION,DEV]" triton==3.1.0
42+
RUN pip install --no-cache-dir --no-build-isolation -e ".[CORE,OPTIONAL,HUGGINGFACE,SSM,VISION,GENERATION,DEV]" triton==3.5.1
4243

4344
# Copy the remaining source code with universal write permissions.
4445
COPY --chmod=777 ./Megatron-LM Megatron-LM

setup.cfg

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -25,10 +25,10 @@ CORE =
2525
# Used for checkpoints
2626
safetensors>=0.5.3
2727
# Update the base image (version fixed to ensure there is a wheel for the base image), may need --no-build-isolation
28-
flash-attn==2.7.3
29-
# Dropless MLP is broken with triton 3.2.0, 3.3.0 and 3.3.1. TODO: Remove once a working triton version is released.
30-
# TODO: Removed because it breaks cpu-only installs and pip dependency resolution.
31-
# triton==3.1.0
28+
flash-attn==2.7.4.post1
29+
# Dropless MoE kernel is broken with triton >= 3.2.0 and needs a rewrite (also limited to 32 experts).
30+
# Not pinning triton here as it breaks cpu-only installs and pip dependency resolution.
31+
# triton==3.5.1
3232

3333

3434
# Small packages required for some optional features and tools.
@@ -52,8 +52,8 @@ HUGGINGFACE =
5252
# To install on cpu environment (ex. for IDE support):
5353
# MAMBA_FORCE_BUILD=TRUE CAUSAL_CONV1D_FORCE_BUILD=TRUE CAUSAL_CONV1D_SKIP_CUDA_BUILD=TRUE pip install -e ".[CORE,SSM]" --no-build-isolation
5454
SSM =
55-
mamba_ssm[causal-conv1d]==2.2.4
56-
flash-linear-attention @ git+https://github.com/fla-org/flash-linear-attention@main
55+
mamba_ssm[causal-conv1d]==2.2.6.post3
56+
flash-linear-attention @ git+https://github.com/fla-org/flash-linear-attention@67eee20c8503cd19eeb52aa1b99821308e9260c5
5757

5858
GENERATION =
5959
lm_eval>=0.4.9

0 commit comments

Comments
 (0)