You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix shutdown hang when background sync operations are in progress
When `Node::stop()` was called while background wallet sync operations
were actively running, the node would hang for up to 5+ seconds before
timing out and forcefully aborting tasks. In some cases, this could
result in an indefinite hang if blocking operations in spawned threads
couldn't be properly terminated.
## Root Cause
The background sync loop in `ChainSource::start_tx_based_sync_loop()`
used `tokio::select!` to multiplex between the stop signal and various
sync interval ticks. However, once a sync operation (e.g.,
`sync_lightning_wallet()`, `sync_onchain_wallet()`) began executing,
the select could not respond to the stop signal until that operation
completed.
These sync operations internally use `runtime.spawn_blocking()` for
I/O-heavy electrum/esplora calls, with timeouts of 10-20 seconds
(LDK_WALLET_SYNC_TIMEOUT_SECS, BDK_WALLET_SYNC_TIMEOUT_SECS). The
shutdown timeout (BACKGROUND_TASK_SHUTDOWN_TIMEOUT_SECS) is only 5
seconds, creating a race condition where:
1. Background sync starts a wallet sync (potential 10-20s operation)
2. User calls stop()
3. Stop signal is sent but sync operation continues
4. `wait_on_background_tasks()` times out after 5s and aborts
5. Blocking thread continues running, potentially causing hang
## Solution
This commit implements a biased nested `tokio::select!` pattern:
1. **Outer biased select**: The `biased` modifier ensures the stop
signal is always checked first before polling any interval ticks,
preventing new sync operations from starting after shutdown is
initiated.
2. **Inner nested selects**: Each sync operation is wrapped in its own
`tokio::select!` that can race the operation against the stop
signal. This allows cancellation even if a sync has just started.
With this approach, when `stop()` is called:
- The next loop iteration immediately sees the stop signal (biased)
- If a sync is in progress, it can be interrupted mid-operation
- Shutdown completes in milliseconds instead of seconds
## Testing
Added integration test `shutdown_during_background_sync` that enables
background sync with 2-second intervals, triggers manual sync, waits
for background sync to potentially start, then calls stop(). The test
verifies shutdown completes within 3 seconds (typically ~10ms).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
0 commit comments