Skip to content

Commit 3a08337

Browse files
[linux] handle stop events while stepping
## Purpose Properly handle a case that occasionally triggers [this](https://github.com/compnerd/ds2/blob/main/Sources/Target/Linux/Process.cpp#L274) `DS2BUG` fatal error in `Linux::Process::wait` when running some multi-threaded lldb tests. ## Overview * Handle the case where we get a stop event for a single-stepping thread with `kEventNone` and `kReasonNone` * Enable 15 lldb test cases that were previously flaky due to this bug * Rework the `Linux::Process::wait` function be a little bit more readable * Handle all failure cases in `Linux::Process::wait` with the `CHK()` macro ## Problem Details The scenario where a `kEventNone` event with a reason other than `kReasonThreadSpawn` is alluded to in [this](https://github.com/compnerd/ds2/blob/main/Sources/Target/Linux/Process.cpp#L236-L238) comment block of `Linux::Process::wait` but it suggests it will never happen while the thread is single stepping. However, this scenario can legitimately occur and is fairly easy to reproduce with the lldb multithreaded tests. In this case, the reason will be `kReasonNone`. The implementation already handled this `kEventNone` and `kReasonNone` event by continuing the thread as long as it wasn't also being single-stepped. This change updates it to step the thread in the same situation if it is being single-stepped. However, I haven't 100% convinced myself the implementation was correct to begin with (it could probably use an overhaul). Intuitively, I would expect to ignore events with `kEventNone` and `kReasonNone`, but that's not what it was doing previously. ## Validation The re-enabled tests now pass consistently locally on Android and Linux, and have now passed on at least 10 automated runs. Manually set some breakpoints and stepped in a simple multi-threaded program. It behaved as expected. Co-authored-by: Saleem Abdulrasool <compnerd@compnerd.org>
1 parent 246c9c0 commit 3a08337

File tree

2 files changed

+45
-55
lines changed

2 files changed

+45
-55
lines changed

Sources/Target/Linux/Process.cpp

Lines changed: 45 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -209,15 +209,15 @@ ErrorCode Process::wait() {
209209
}
210210

211211
stepping = _currentThread->_state == Thread::kStepped;
212-
_currentThread->updateStopInfo(status);
212+
CHK(_currentThread->updateStopInfo(status));
213213

214214
switch (_currentThread->_stopInfo.event) {
215215
case StopInfo::kEventNone:
216216
// If _stopInfo.event is kEventNone it means the thread is either
217217
// 1. stopped because another thread of the inferior stopped (e.g. because
218218
// of a breakpoint or received a signal, etc) and ds2 sent a SIGSTOP
219219
// to pause this thread while the other thread's stop is evaluated. In
220-
// this case we just resume the thread.
220+
// this case we just resume or step the thread.
221221
// 2. stopped because this thread called clone(2) (or similar e.g.
222222
// pthread_create). In this case we have two subcases:
223223
// a. The thread was running and hit the clone/spawn and thus was
@@ -232,48 +232,53 @@ ErrorCode Process::wait() {
232232
// waitpid(3) call and thus have to handle the cloned thread
233233
// and the currentThread at once. So we call step on the
234234
// _currentThread and then beforeResume the cloned thread.
235-
//
236-
// We also have another theoretically possible case where we are not
237-
// stopped by a clone but still stepping while encountering a kEventNone.
238-
// This should not happen and we just assert that it does not.
239-
240-
if (_currentThread->_stopInfo.reason == StopInfo::kReasonThreadSpawn &&
241-
stepping) { //(2b)
242-
unsigned long spawnedThreadIdData;
243-
CHK(ptrace().getEventMessage(_pid, spawnedThreadIdData));
244-
auto const spawnedThreadId = static_cast<ThreadId>(spawnedThreadIdData);
245-
246-
if (_threads.find(spawnedThreadId) == _threads.end()) {
247-
// If the newly cloned thread does not yet exist within the _threads
248-
// vector then we have not yet seen it with waitpid(2). The most
249-
// recent
250-
// waitpid(2) returns a status guaranteeing a clone event did happen.
251-
// Therefore, we wait and block until we see the cloned thread.
252-
int spawnedThreadStatus;
253-
ThreadId returnedThreadId =
254-
blocking_waitpid(spawnedThreadId, &spawnedThreadStatus, __WALL);
255-
if (spawnedThreadId != returnedThreadId)
256-
return kErrorProcessNotFound;
257-
DS2LOG(Debug, "child thread tid %d %s", returnedThreadId,
258-
Stringify::WaitStatus(spawnedThreadStatus));
259-
260-
_currentThread->step();
261-
auto newThread = new Thread(this, returnedThreadId);
262-
newThread->beforeResume();
263-
} else {
264-
// The new thread corresponding to this kReasonThreadSpawn was already
265-
// caught by waitpid(2). Therefore, we only need to step the current
266-
// thread.
267-
_currentThread->step();
235+
236+
switch (_currentThread->_stopInfo.reason) {
237+
case StopInfo::kReasonNone: // (1)
238+
if (stepping)
239+
CHK(_currentThread->step());
240+
else
241+
CHK(_currentThread->resume());
242+
break;
243+
244+
case StopInfo::kReasonThreadSpawn:
245+
if (!stepping) { // (2a)
246+
CHK(_currentThread->resume());
247+
} else { // (2b)
248+
unsigned long spawnedThreadIdData;
249+
CHK(ptrace().getEventMessage(_pid, spawnedThreadIdData));
250+
auto const spawnedThreadId = static_cast<ThreadId>(spawnedThreadIdData);
251+
if (_threads.find(spawnedThreadId) == _threads.end()) {
252+
// If the newly cloned thread does not yet exist within the _threads
253+
// vector then we have not yet seen it with waitpid(2). The most
254+
// recent waitpid(2) returns a status guaranteeing a clone event did happen.
255+
// Therefore, we wait and block until we see the cloned thread.
256+
int spawnedThreadStatus;
257+
ThreadId returnedThreadId =
258+
blocking_waitpid(spawnedThreadId, &spawnedThreadStatus, __WALL);
259+
if (spawnedThreadId != returnedThreadId)
260+
return kErrorProcessNotFound;
261+
DS2LOG(Debug, "child thread tid %d %s", returnedThreadId,
262+
Stringify::WaitStatus(spawnedThreadStatus));
263+
264+
CHK(_currentThread->step());
265+
auto newThread = new Thread(this, returnedThreadId);
266+
CHK(newThread->beforeResume());
267+
} else {
268+
// The new thread corresponding to this kReasonThreadSpawn was already
269+
// caught by waitpid(2). Therefore, we only need to step the current
270+
// thread.
271+
CHK(_currentThread->step());
272+
}
268273
}
269-
} else if (stepping) {
274+
break;
275+
276+
default:
270277
// We should never see a case where we're:
271278
// 1. stopped for event kEventNone
272-
// 2. stepping
273-
// 3. not stopped for reason kReasonThreadSpawn
279+
// 2. not stopped for reason kReasonThreadSpawn or kReasonNone
274280
DS2BUG("inconsistent thread stop info");
275-
} else {
276-
_currentThread->resume(); // (1) and (2a)
281+
break;
277282
}
278283
goto continue_waiting;
279284

Support/Testing/Excluded/ds2/android-x86_64.excluded

Lines changed: 0 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -6,28 +6,16 @@ TestBadAddressBreakpoints.BadAddressBreakpointTestCase.test_bad_address_breakpoi
66
TestBreakpointInGlobalConstructor.TestBreakpointInGlobalConstructors.test
77
TestBreakpointSetRestart.BreakpointSetRestart.test_breakpoint_set_restart_dwarf
88
TestBreakpointSetRestart.BreakpointSetRestart.test_breakpoint_set_restart_dwo
9-
TestBreakOnLambdaCapture.TestBreakOnLambdaCapture.test_break_on_lambda_capture
109
TestBtAliasRepeat.TestCase.test_dwarf
1110
TestBtAliasRepeat.TestCase.test_dwo
1211
TestChangeProcessGroup.ChangeProcessGroupTestCase.test_setpgid
13-
TestChangeValueAPI.ChangeValueAPITestCase.test_change_value_dwo
1412
TestCompletion.CommandLineCompletionTestCase.test_process_unload
15-
TestConcurrentBreakpointDelayBreakpointOneSignal.ConcurrentBreakpointDelayBreakpointOneSignal.test
16-
TestConcurrentBreakpointsDelayedBreakpointOneWatchpoint.ConcurrentBreakpointsDelayedBreakpointOneWatchpoint.test
17-
TestConcurrentDelayedCrashWithBreakpointSignal.ConcurrentDelayedCrashWithBreakpointSignal.test
18-
TestConcurrentDelayedCrashWithBreakpointWatchpoint.ConcurrentDelayedCrashWithBreakpointWatchpoint.test
1913
TestConcurrentNWatchNBreak.ConcurrentNWatchNBreak.test
2014
TestConcurrentSignalNWatchNBreak.ConcurrentSignalNWatchNBreak.test
21-
TestConcurrentSignalWatchBreak.ConcurrentSignalWatchBreak.test
22-
TestConcurrentTwoBreakpointsOneDelaySignal.ConcurrentTwoBreakpointsOneDelaySignal.test
23-
TestConcurrentTwoBreakpointsOneSignal.ConcurrentTwoBreakpointsOneSignal.test
24-
TestConcurrentTwoBreakpointsOneWatchpoint.ConcurrentTwoBreakpointsOneWatchpoint.test
25-
TestConcurrentTwoWatchpointsOneBreakpoint.ConcurrentTwoWatchpointsOneBreakpoint.test
2615
TestConcurrentVFork.TestConcurrentVFork.test_follow_child_fork_call_exec
2716
TestConcurrentVFork.TestConcurrentVFork.test_follow_child_fork_no_exec
2817
TestConcurrentVFork.TestConcurrentVFork.test_follow_child_vfork_call_exec
2918
TestConcurrentVFork.TestConcurrentVFork.test_follow_child_vfork_no_exec
30-
TestConcurrentWatchpointDelayWatchpointOneBreakpoint.ConcurrentWatchpointDelayWatchpointOneBreakpoint.test
3119
TestConstVariables.ConstVariableTestCase.test_and_run_command_dwarf
3220
TestConstVariables.ConstVariableTestCase.test_and_run_command_dwo
3321
TestContainerCommands.TestCmdContainer.test_container_add
@@ -149,9 +137,6 @@ TestTargetCreateDeps.TargetDependentsTestCase.test_dependents_implicit_default_l
149137
TestThreadAPI.ThreadAPITestCase.test_step_out_of_malloc_into_function_b_dwarf
150138
TestThreadAPI.ThreadAPITestCase.test_step_out_of_malloc_into_function_b_dwo
151139
TestThreadSelectionBug.TestThreadSelectionBug.test
152-
TestThreadStates.ThreadStateTestCase.test_process_interrupt
153-
TestThreadStepOut.ThreadStepOutTestCase.test_step_single_thread_dwo
154-
TestTwoHitsOneActual.TestTwoHitsOneActual.test_two_hits_one_actual
155140
TestUnalignedLargeWatchpoint.UnalignedLargeWatchpointTestCase.test_unaligned_large_watchpoint
156141
TestUnwindExpression.UnwindFromExpressionTest.test_unwind_expression_dwarf
157142
TestUnwindExpression.UnwindFromExpressionTest.test_unwind_expression_dwo

0 commit comments

Comments
 (0)