Skip to content

Commit 5f38a1e

Browse files
committed
add lucas
Signed-off-by: youkaichao <youkaichao@gmail.com>
1 parent a9d9404 commit 5f38a1e

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

_posts/2025-11-27-improved-cuda-debugging.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -317,10 +317,10 @@ With the approach outlined above, we can uncover the full inline chain of the so
317317

318318
## Conclusion
319319

320-
This blog post introduced two advanced debugging techniques for CUDA kernels. The first technique uses user-triggered core dumps to identify hanging kernels, while the second traces complex kernels back to their source code by leveraging line information embedded in the compiled binary. These techniques are powerful tools for debugging complex issues in CUDA kernels, especially illegal memory access problems.
320+
This blog post introduced two advanced debugging techniques for CUDA kernels. The first technique uses user-triggered core dumps to identify hanging kernels, while the second traces complex kernels back to their source code by leveraging line information embedded in the compiled binary. These techniques are powerful tools for debugging complex issues in CUDA kernels, especially illegal memory access problems. Using both in tandem we were able to recently debug [a hard-to-reproduce and tricky hang in the CUTLASS MLA attention backend](https://github.com/vllm-project/vllm/pull/26026), which actually stemmed from the upstream CUTLASS code example and has since been fixed in [v4.3.0](https://github.com/NVIDIA/cutlass/commit/b1d6e2c9b334dfa811e4183dfbd02419249e4b52).
321321

322322
The vLLM project aims to provide easy, fast, and affordable LLM serving for everyone, and accessible debugging is an important aspect of this mission. We will continue to share more debugging tips and techniques in the future to build a strong LLM inference ecosystem together. To share your story or usage with vLLM, please submit a PR at [the blogpost repository](https://github.com/vllm-project/vllm-project.github.io).
323323

324324
# Acknowledgement
325325

326-
We would like to thank Ze Long and Sandarbh Jain from NVIDIA for their helpful discussions. Chao Hong from Moonshot AI helped provide the motivating example.
326+
We would like to thank Ze Long and Sandarbh Jain from NVIDIA for their helpful discussions. Chao Hong from Moonshot AI helped provide the motivating example. Lucas Wilkinson from Red Hat helped polishing the draft.

0 commit comments

Comments
 (0)