Skip to content

Segfault in faulthandler signal handler with threads #116008

@dgrisby

Description

@dgrisby

Crash report

What happened?

We use faulthandler.register to dump stacks on SIGUSR1.

faulthandler.register(signal.SIGUSR1, chain=True)

Sending the signal to a running multi-threaded process, it segfaulted. It successfully dumped several thread stacks, but segfaulted part way through:

Thread 0x00007f4fbd4a3640 (most recent call first):
  File "/usr/tideway/lib/python3.11/threading.py", line 320 in wait
  File "/usr/tideway/lib/python3.11/threading.py", line 622 in wait
  File "/usr/tideway/python/reasoning/consolidation/sender.py", line 1151 in _threadedSendingOfData
  File "/usr/tideway/lib/python3.11/threading.py", line 975 in run
  File "/usr/tideway/lib/python3.11/threading.py", line 1038 in _bootstrap_inner
  File "/usr/tideway/lib/python3.11/threading.py", line 995 in _bootstrap

Thread 0x00007f4fafca0640 (most recent call first):
  File "/usr/tideway/python/stubs/search_idl.py", line 720 in search
  File "/usr/tideway/python/reasoning/ecaengine/tpl/tpl_search.py", line 87 in _doSearch
  File "/usr/tideway/python/reasoning/ecaengine/tpl/tpl_search.py", line 103 in search
  File "/usr/tideway/var/code/tpl_py/SupportDetail/SoftwareInstance/__init__.py", line 233 in execute
  File "/usr/tideway/python/reasoning/ecaengine/engine/rule.py", line 141 in doExecute
  File "/usr/tideway/python/reasoning/ecaengine/engine/__init__.py", line 715 in processEvent
  File "/usr/tideway/python/reasoning/ecaengine/transactional_server.py", line 258 in processEvent
  File "/usr/tideway/pylib/00_hack/common/threadpool.py", line 93 in run
  File "/usr/tideway/lib/python3.11/threading.py", line 1038 in _bootstrap_inner
  File "/usr/tideway/lib/python3.11/threading.py", line 995 in _bootstrap

Thread 0x00007f4fb04a1640 (most recent call first):
  File "/usr/tideway/lib/python3.11/threading.py", line 338 in notify_all
  File "/usr/tideway/lib/python3.11/concurrent/futures/_base.py", line 550 in set_result
  File Tue Feb 27 14:17:40 2024 : Watchdog : Service failed after 1674.6 seconds (signal 11; exit 0). Restarting (1 restart so far).

The last line shows that it wrote "File", then segfaulted. (That was caught by a parent watchdog process that restarted the process. The timestamp and rest of the message is from the watchdog process.)

Looking at the core file:

Core was generated by `python3 -m reasoning.main --daemon start'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f4fcd393e70 in dump_frame (frame=0x7f4f9533bac0, fd=2) at Python/traceback.c:1190
1190	        && PyUnicode_Check(code->co_filename))

(gdb) bt
#0  0x00007f4fcd393e70 in dump_frame (frame=0x7f4f9533bac0, fd=2) at Python/traceback.c:1190
#1  dump_traceback (fd=fd@entry=2, tstate=tstate@entry=0xbd23860, write_header=write_header@entry=0) at Python/traceback.c:1242
#2  0x00007f4fcd3940b7 in _Py_DumpTracebackThreads (fd=fd@entry=2, interp=<optimized out>, interp@entry=0x0, current_tstate=0x7f4fcd75d2d8 <_PyRuntime+166328>) at Python/traceback.c:1344
#3  0x00007f4fcd39a380 in faulthandler_dump_traceback (fd=2, all_threads=1, interp=<optimized out>) at ./Modules/faulthandler.c:252
#4  0x00007f4fcd39a539 in faulthandler_user (signum=10) at ./Modules/faulthandler.c:862
#5  <signal handler called>
#6  0x00007f4fcce9c3aa in __futex_abstimed_wait_common () from /lib64/libc.so.6
#7  0x00007f4fccea6bae in __new_sem_wait_slow64 () from /lib64/libc.so.6
#8  0x00007f4fcd466270 in PyThread_acquire_lock_timed (lock=lock@entry=0x50d6510, microseconds=microseconds@entry=5000000, intr_flag=intr_flag@entry=1) at Python/thread_pthread.h:483
#9  0x00007f4fcd4f5c72 in acquire_timed (lock=0x50d6510, timeout=5000000000) at ./Modules/_threadmodule.c:98
#10 0x00007f4fcd4f5abe in lock_PyThread_acquire_lock (self=0x7f4fb1dabdc0, args=args@entry=0x7f4fa0b4fdc0, kwds=kwds@entry=0x0) at ./Modules/_threadmodule.c:179
#11 0x00007f4fcd3ea0dc in method_vectorcall_VARARGS_KEYWORDS (func=<optimized out>, args=0x7f4fcd9394b0, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/descrobject.c:364
#12 0x00007f4fcd3e59a8 in _PyObject_VectorcallTstate (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=0x7f4fcd18af20, tstate=0x7f4fcd75d2d8 <_PyRuntime+166328>)
    at ./Include/internal/pycore_call.h:92
#13 PyObject_Vectorcall (callable=0x7f4fcd18af20, args=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/call.c:299
#14 0x00007f4fcd4438ca in _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:4774
#15 0x00007f4fcd442557 in _PyEval_EvalFrame (throwflag=0, frame=0x7f4fcd9391b8, tstate=0x7f4fcd75d2d8 <_PyRuntime+166328>) at ./Include/internal/pycore_ceval.h:73
#16 _PyEval_Vector (tstate=0x7f4fcd75d2d8 <_PyRuntime+166328>, func=<optimized out>, locals=0x7f4fcd1f30c0, args=0x0, argcount=<optimized out>, kwnames=<optimized out>) at Python/ceval.c:6439
#17 0x00007f4fcd4bf2f4 in PyEval_EvalCode (co=co@entry=0x7f4fccd230a0, globals=globals@entry=0x7f4fcd1f30c0, locals=locals@entry=0x7f4fcd1f30c0) at Python/ceval.c:1154
#18 0x00007f4fcd4be3b4 in builtin_exec_impl (module=<optimized out>, closure=<optimized out>, locals=0x7f4fcd1f30c0, globals=0x7f4fcd1f30c0, source=0x7f4fccd230a0) at Python/bltinmodule.c:1077
#19 builtin_exec (module=<optimized out>, args=<optimized out>, nargs=<optimized out>, kwnames=<optimized out>) at Python/clinic/bltinmodule.c.h:465
#20 0x00007f4fcd40c4c6 in cfunction_vectorcall_FASTCALL_KEYWORDS (func=0x7f4fcd188f90, args=0x7f4fcd939180, nargsf=<optimized out>, kwnames=0x0) at ./Include/cpython/methodobject.h:52
#21 0x00007f4fcd3e59a8 in _PyObject_VectorcallTstate (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=0x7f4fcd188f90, tstate=0x7f4fcd75d2d8 <_PyRuntime+166328>)
    at ./Include/internal/pycore_call.h:92
#22 PyObject_Vectorcall (callable=0x7f4fcd188f90, args=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/call.c:299
#23 0x00007f4fcd4438ca in _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:4774
#24 0x00007f4fcd442557 in _PyEval_EvalFrame (throwflag=0, frame=0x7f4fcd939020, tstate=0x7f4fcd75d2d8 <_PyRuntime+166328>) at ./Include/internal/pycore_ceval.h:73
#25 _PyEval_Vector (tstate=0x7f4fcd75d2d8 <_PyRuntime+166328>, func=<optimized out>, locals=0x0, args=0x7f4fccda6c18, argcount=<optimized out>, kwnames=<optimized out>) at Python/ceval.c:6439
#26 0x00007f4fcd4dc089 in pymain_run_module (modname=<optimized out>, set_argv0=<optimized out>) at Modules/main.c:300
#27 0x00007f4fcd4dbc8d in pymain_run_python (exitcode=0x7ffda6d1e874) at Modules/main.c:595
#28 Py_RunMain () at Modules/main.c:680
#29 0x00007f4fcd4dba39 in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:734
#30 0x00007f4fcce3feb0 in __libc_start_call_main () from /lib64/libc.so.6
#31 0x00007f4fcce3ff60 in __libc_start_main_impl () from /lib64/libc.so.6
#32 0x0000000000401075 in _start ()
(gdb) p code
$1 = (PyCodeObject *) 0x7f4fcd63ebe0 <empty_keys_struct>
(gdb) p *code
$2 = {ob_base = {ob_base = {ob_refcnt = 118616, ob_type = 0x100010000}, ob_size = 0}, co_consts = 0x0, co_names = 0xffffffffffffffff, co_exceptiontable = 0x0, co_flags = 0, co_warmup = 0, _co_linearray_entry_size = 0, 
  co_argcount = 0, co_posonlyargcount = 0, co_kwonlyargcount = 11, co_stacksize = 0, co_firstlineno = -849062080, co_nlocalsplus = 32591, co_nlocals = 0, co_nplaincellvars = 0, co_ncellvars = -850270115, 
  co_nfreevars = 32591, co_localsplusnames = 0x18, co_localspluskinds = 0x0, co_filename = 0x7f4fcd408c90 <dictview_dealloc>, co_name = 0x0, co_qualname = 0x0, co_linetable = 0x0, co_weakreflist = 0x0, 
  _co_code = 0x7f4fcd499e60 <dictview_repr>, _co_linearray = 0x0, _co_firsttraceable = -849084352, co_extra = 0x0, co_code_adaptive = " "}
(gdb) p *code->co_filename
$3 = {ob_refcnt = -177530513340544939, ob_type = 0x8418b48f8578b48}
(gdb) p *code->co_filename->ob_type
Cannot access memory at address 0x8418b48f8578b48

co_filename is clearly invalid here.

Most of the time dumping the stack works fine, but doing it repeatedly I can reproduce this when the process is busy. A second time, it failed in a slightly different place, accessing co_name:

(gdb) bt
#0  0x00007f4ab7993f05 in dump_frame (frame=0x7f4ab3ab43a8, fd=2) at Python/traceback.c:1210
#1  dump_traceback (fd=fd@entry=2, tstate=tstate@entry=0xae9d1f0, write_header=write_header@entry=0) at Python/traceback.c:1242
#2  0x00007f4ab79940b7 in _Py_DumpTracebackThreads (fd=fd@entry=2, interp=<optimized out>, interp@entry=0x0, current_tstate=0x7f4ab7d5d2d8 <_PyRuntime+166328>) at Python/traceback.c:1344
#3  0x00007f4ab799a380 in faulthandler_dump_traceback (fd=2, all_threads=1, interp=<optimized out>) at ./Modules/faulthandler.c:252
#4  0x00007f4ab799a539 in faulthandler_user (signum=10) at ./Modules/faulthandler.c:862
#5  <signal handler called>
#6  0x00007f4ab749c3aa in __futex_abstimed_wait_common () from /lib64/libc.so.6
...


(gdb) p *code
$2 = {ob_base = {ob_base = {ob_refcnt = 336273, ob_type = 0x7f4ab7c42940 <_PyNone_Type>}, ob_size = 0}, co_consts = 0x0, co_names = 0xa2, co_exceptiontable = 0x7f4ab7c45740 <PyType_Type>, co_flags = 0, co_warmup = 0, 
  _co_linearray_entry_size = 0, co_argcount = -1213067338, co_posonlyargcount = 32586, co_kwonlyargcount = 16, co_stacksize = 0, co_firstlineno = 0, co_nlocalsplus = 0, co_nlocals = -1215091588, co_nplaincellvars = 32586, 
  co_ncellvars = 0, co_nfreevars = 0, co_localsplusnames = 0x0, co_localspluskinds = 0x0, co_filename = 0x0, co_name = 0x7f4ab7a0eb60 <none_repr>, co_qualname = 0x7f4ab7c42c40 <none_as_number>, co_linetable = 0x0, 
  co_weakreflist = 0x0, _co_code = 0x7f4ab7a61a20 <_Py_HashPointer>, _co_linearray = 0x0, _co_firsttraceable = -1214131376, co_extra = 0x7f4ab7a0dfe0 <PyObject_GenericGetAttr>, co_code_adaptive = "P"}

(gdb) p code->co_name
$4 = (PyObject *) 0x7f4ab7a0eb60 <none_repr>
(gdb) p *code->co_name
$5 = {ob_refcnt = -1657306674253427384, ob_type = 0x401f0fffee6194}

CPython versions tested on:

3.11

Operating systems tested on:

Linux

Output from running 'python -VV' on the command line:

Python 3.11.4 (main, Jun 19 2023, 17:32:00) [GCC 11.3.1 20221121 (Red Hat 11.3.1-4.3.0.1)]

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    extension-modulesC modules in the Modules dirtype-crashA hard crash of the interpreter, possibly with a core dump

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions