Skip to content

Commit d1e6aea

Browse files
authored
add tldr for cuda coredump (#98)
Signed-off-by: youkaichao <youkaichao@gmail.com>
1 parent b307ba8 commit d1e6aea

File tree

1 file changed

+9
-0
lines changed

1 file changed

+9
-0
lines changed

_posts/2025-08-11-cuda-debugging.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,15 @@ author: "Kaichao You"
55
image: /assets/logos/vllm-logo-text-light.png
66
---
77

8+
TL;DR: If you hit `an illegal memory access was encountered` error, you can enable CUDA core dump to debug the issue. Simply set the following environment variables and run your program again to collect the coredump file, then you can use `cuda-gdb` to debug the issue.
9+
10+
```bash
11+
CUDA_ENABLE_COREDUMP_ON_EXCEPTION=1 \
12+
CUDA_COREDUMP_SHOW_PROGRESS=1 \
13+
CUDA_COREDUMP_GENERATION_FLAGS='skip_nonrelocated_elf_images,skip_global_memory,skip_shared_memory,skip_local_memory,skip_constbank_memory' \
14+
CUDA_COREDUMP_FILE="/tmp/cuda_coredump_%h.%p.%t"
15+
```
16+
817
# Introduction
918

1019
Have you ever felt you are developing cuda kernels and your tests often run into illegal memory access (IMA for short) and you have no idea how to debug? We definitely felt this pain again and again while working on vLLM, a high-performance inference engine for LLM models.

0 commit comments

Comments
 (0)