Skip to content

Commit c93cc60

Browse files
committed
runtime: allow Stack to traceback goroutines in syscall _Grunning window
net/http/cgi.TestCopyError calls runtime.Stack to take a stack trace of all goroutines, and searches for a specific line in that stack trace. It currently sometimes fails because it encounters the goroutine its looking for in the small window where a goroutine might be in _Grunning while in a syscall, introduced in CL 646198. In that case, the traceback will give up, failing to print the stack TestCopyError is expecting. This represents a general regression, since previously runtime.Stack could never fail to take a goroutine's stack; giving up was only possible in fatal panic cases. Fix this the same way we fixed goroutine profiles: allow the stack trace to proceed if the g's syscallsp != 0. This is safe in any stop-the-world-related context, because syscallsp won't be mutated while the goroutine fails to acquire a P, and thus fails to fully exit the syscall context. This also means the stack below syscallsp won't be mutated, and thus taking a traceback is also safe. Fixes #66639. Change-Id: Ie6f4b0661d9f8df02c9b8434e99bc95f26fe5f0d Reviewed-on: https://go-review.googlesource.com/c/go/+/716680 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
1 parent b5353fd commit c93cc60

File tree

1 file changed

+10
-1
lines changed

1 file changed

+10
-1
lines changed

src/runtime/traceback.go

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1314,7 +1314,16 @@ func tracebacksomeothers(me *g, showf func(*g) bool) {
13141314
// from a signal handler initiated during a systemstack call.
13151315
// The original G is still in the running state, and we want to
13161316
// print its stack.
1317-
if gp.m != getg().m && readgstatus(gp)&^_Gscan == _Grunning {
1317+
//
1318+
// There's a small window of time in exitsyscall where a goroutine could be
1319+
// in _Grunning as it's exiting a syscall. This could be the case even if the
1320+
// world is stopped or frozen.
1321+
//
1322+
// This is OK because the goroutine will not exit the syscall while the world
1323+
// is stopped or frozen. This is also why it's safe to check syscallsp here,
1324+
// and safe to take the goroutine's stack trace. The syscall path mutates
1325+
// syscallsp only just before exiting the syscall.
1326+
if gp.m != getg().m && readgstatus(gp)&^_Gscan == _Grunning && gp.syscallsp == 0 {
13181327
print("\tgoroutine running on other thread; stack unavailable\n")
13191328
printcreatedby(gp)
13201329
} else {

0 commit comments

Comments
 (0)