-
-
Notifications
You must be signed in to change notification settings - Fork 11.8k
Description
Your current environment
Collecting environment information...
System Info
==============================
OS : Ubuntu 22.04.5 LTS (x86_64)
GCC version : (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version : Could not collect
CMake version : version 4.1.0
Libc version : glibc-2.35
==============================
PyTorch Info
PyTorch version : 2.8.0+cu128
Is debug build : False
CUDA used to build PyTorch : 12.8
ROCM used to build PyTorch : N/A
==============================
Python Environment
Python version : 3.12.11 (main, Jun 4 2025, 08:56:18) [GCC 11.4.0] (64-bit runtime)
Python platform : Linux-6.8.0-57-generic-x86_64-with-glibc2.35
==============================
CUDA / GPU Info
Is CUDA available : True
CUDA runtime version : 12.8.93
CUDA_MODULE_LOADING set to : LAZY
GPU models and configuration :
GPU 0: NVIDIA B200
GPU 1: NVIDIA B200
GPU 2: NVIDIA B200
GPU 3: NVIDIA B200
GPU 4: NVIDIA B200
GPU 5: NVIDIA B200
GPU 6: NVIDIA B200
GPU 7: NVIDIA B200
Nvidia driver version : 580.65.01
cuDNN version : Could not collect
HIP runtime version : N/A
MIOpen runtime version : N/A
Is XNNPACK available : True
==============================
CPU Info
Architecture: x86_64
Versions of relevant libraries
[pip3] flashinfer-python==0.3.1
[pip3] numpy==2.2.6
[pip3] nvidia-cublas-cu12==12.8.4.1
[pip3] nvidia-cuda-cupti-cu12==12.8.90
[pip3] nvidia-cuda-nvrtc-cu12==12.8.93
[pip3] nvidia-cuda-runtime-cu12==12.8.90
[pip3] nvidia-cudnn-cu12==9.10.2.21
[pip3] nvidia-cudnn-frontend==1.14.1
[pip3] nvidia-cufft-cu12==11.3.3.83
[pip3] nvidia-cufile-cu12==1.13.1.3
[pip3] nvidia-curand-cu12==10.3.9.90
[pip3] nvidia-cusolver-cu12==11.7.3.90
[pip3] nvidia-cusparse-cu12==12.5.8.93
[pip3] nvidia-cusparselt-cu12==0.7.1
[pip3] nvidia-ml-py==13.580.82
[pip3] nvidia-nccl-cu12==2.27.3
[pip3] nvidia-nvjitlink-cu12==12.8.93
[pip3] nvidia-nvtx-cu12==12.8.90
[pip3] pynvml==13.0.1
[pip3] pyzmq==27.1.0
[pip3] torch==2.8.0+cu128
[pip3] torchaudio==2.8.0+cu128
[pip3] torchvision==0.23.0+cu128
[pip3] transformers==4.56.2
[pip3] triton==3.4.0
[conda] Could not collect
==============================
vLLM Info
ROCM Version : Could not collect
vLLM Version : 0.11.1rc1.dev118+g1726e93ef (git sha: 1726e93)
vLLM Build Flags:
The output of python collect_env.py
Your output of `python collect_env.py` here
🐛 Describe the bug
[1;36m(Worker_DP1_EP1 pid=746595)[0;0m ERROR 10-01 00:16:25 [multiproc_executor.py:671] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/flashinfer_cutlass_prepare_finalize.py", line 202, in finalize
[1;36m(Worker_DP1_EP1 pid=746595)[0;0m ERROR 10-01 00:16:25 [multiproc_executor.py:671] fused_expert_output = get_dp_group().reduce_scatterv(
[1;36m(Worker_DP1_EP1 pid=746595)[0;0m ERROR 10-01 00:16:25 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[1;36m(Worker_DP1_EP1 pid=746595)[0;0m ERROR 10-01 00:16:25 [multiproc_executor.py:671] File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/parallel_state.py", line 423, in reduce_scatterv
[1;36m(Worker_DP1_EP1 pid=746595)[0;0m ERROR 10-01 00:16:25 [multiproc_executor.py:671] return self.device_communicator.reduce_scatterv(input_, dim, sizes)
[1;36m(Worker_DP1_EP1 pid=746595)[0;0m ERROR 10-01 00:16:25 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[1;36m(Worker_DP1_EP1 pid=746595)[0;0m ERROR 10-01 00:16:25 [multiproc_executor.py:671] File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/device_communicators/cuda_communicator.py", line 207, in reduce_scatterv
[1;36m(Worker_DP1_EP1 pid=746595)[0;0m ERROR 10-01 00:16:25 [multiproc_executor.py:671] assert input_tensor.shape[0] == sum(sizes)
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.