Skip to content

Conversation

@ammar-agent
Copy link
Collaborator

@ammar-agent ammar-agent commented Nov 8, 2025

Problem

Commands like grep -RIn ... | head -n 200 hang over SSH runtime. Previous fix (PR #504) using stdin.abort() addressed a symptom but not the root cause.

Root Cause

bash.ts waited for three async conditions before finalizing:

  • exitCode !== null
  • stdoutEnded (stream 'close' event)
  • stderrEnded (stream 'close' event)

Over SSH with ControlMaster multiplexing, stream 'close' events don't propagate reliably. The 50ms grace period fallback was insufficient, causing hangs on slower commands or high SSH latency.

Solution

Simplified to single wait condition: process exit

Key Changes

  1. Created DisposableProcess (src/utils/disposableExec.ts)

    • Wraps child processes for explicit cleanup on timeout/abort
    • Does NOT auto-cleanup on process exit - streams close naturally
  2. Updated both runtimes (SSHRuntime.ts, LocalRuntime.ts)

    • Use DisposableProcess for consistent process management
    • Explicit disposal only on timeout/abort, not normal exit
  3. Simplified bash.ts (removed ~130 LoC of complex finalization)

    • Before: tryFinalize() waiting for exitCode && stdoutEnded && stderrEnded + 50ms grace period
    • After: Simple await exitCode + 10ms for readline to process buffered data
    • No more multi-condition waits, no arbitrary timeouts

Why This Works

When process exits:

  1. Streams close naturally (no forced cancellation needed)
  2. bash.ts awaits exitCode (one condition, not three)
  3. Small 10ms delay allows readline to process final buffered chunks
  4. Then cleanup readline interfaces and Node streams

No waiting for stream events that don't arrive. No race conditions.

Benefits

  • Fixes the hang - No more waiting for stream 'close' events
  • Simpler - Removed complex state machine with race conditions
  • More reliable - Single wait condition instead of coordinating three
  • Faster - No arbitrary 50ms wait on every command

Testing

  • ✅ All 955 unit tests pass
  • ✅ Integration tests pass (including new regression test for grep|head)
  • ✅ Bash tool tests validate no hangs (43 tests)
  • ✅ Static checks pass

Code Changes

Impact: +467 -444 lines

Files modified:

  • src/utils/disposableExec.ts - New DisposableProcess class (guards against double-kill)
  • src/runtime/SSHRuntime.ts - Use DisposableProcess, remove stream cancellation
  • src/runtime/LocalRuntime.ts - Same pattern for consistency
  • src/services/tools/bash.ts - Simplified finalization logic
  • tests/ipcMain/runtimeExecuteBash.test.ts - Added regression test

Generated with cmux

## Problem

Commands like `grep -RIn ... | head -n 200` would hang over SSH runtime,
not respecting timeouts. Previous fix (PR #504) using stdin.abort() addressed
a symptom but not the root cause.

## Root Cause

bash.ts waited for THREE conditions before finalizing:
- exitCode !== null
- stdoutEnded (stream 'close' event)
- stderrEnded (stream 'close' event)

Over SSH with ControlMaster multiplexing, stream 'close' events don't
propagate reliably. The 50ms grace period fallback was insufficient, allowing
hangs on commands that take longer or during high SSH latency.

## Solution

**Single source of truth: process exit triggers all cleanup**

Created DisposableProcess wrapper that:
1. Registers cleanup callbacks (stream destruction, etc.)
2. Auto-executes cleanup when process exits
3. Makes it IMPOSSIBLE to wait for stream events that never arrive

Key changes:
- SSHRuntime.exec(): Use DisposableProcess, cancel streams on exit
- LocalRuntime.exec(): Same pattern for consistency
- bash.ts: Simplified from complex multi-condition wait to simple `await exitCode`

## Benefits

- ✅ Fixes the hang completely
- ✅ Simpler: Removed ~130 LoC of complex finalization logic
- ✅ Faster: No 50ms grace period on every command
- ✅ Correct by construction: Process lifetime bounds all resources

## Impact

**Invariant**: When exitCode resolves, streams are already destroyed.

**Proof**: DisposableProcess registers cleanup on process 'close' event,
which fires before exitCode promise resolves. Therefore: exitCode resolved ⟹
streams destroyed. No race conditions possible.

## Testing

- All 955 unit tests pass
- Added regression test for grep|head pattern
- bash.ts tests validate no hangs
- LocalRuntime/SSHRuntime tests pass

---
_Generated with `cmux`_
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

When a process exits via signal (e.g., segfault, kill $$), exitCode is null
but signalCode is set. Check both before calling kill() to avoid ESRCH errors.

Also wrap kill() in try/catch as an additional safeguard for TOCTOU race
where process exits between check and kill call.
The previous approach cancelled streams in DisposableProcess cleanup, but this
was too early - bash.ts was still reading from them. Streams close naturally
when the process exits, so no explicit cancellation is needed.

The key fix is that bash.ts now awaits exitCode (which resolves after process
exits), then immediately cleans up readline interfaces and Node streams. This
prevents waiting for stream 'close' events that don't propagate over SSH.
DisposableProcess should NOT auto-call dispose() on process 'close' event.
This was running cleanup before bash.ts finished reading streams.

Dispose is only called explicitly via timeout/abort handlers. Process streams
close naturally when process exits, so no forced cleanup is needed.
After process exits, readline interfaces may still have buffered data to
process. Add a small 10ms delay before destroying streams to allow readline
to finish processing. This prevents empty output in tests.
The previous approach used a 10ms delay to allow readline interfaces to
process buffered data after process exit. This was a timing-based workaround
that wasn't truly correct-by-construction.

## Problem

When a process exits, there's a race between:
1. Process 'exit' event
2. Readline 'close' events (stdout and stderr)

For commands like 'grep | head', the stdout stream closes early (when head
exits), which triggers readline 'close' BEFORE the process 'exit' event.
The 10ms delay hoped to bridge this gap, but it's not deterministic.

## Solution

Wait for all three events concurrently using Promise.all():
- exitCode (process exit)
- stdoutClosed (readline finished processing stdout)
- stderrClosed (readline finished processing stderr)

This ensures we never return results before readline has finished processing
all buffered data, without relying on arbitrary timing delays.

## Testing

- All 955 unit tests pass
- All bash integration tests pass, including the grep|head regression test
- Both local and SSH runtime tests pass
- No timing-based workarounds

---
_Generated with `cmux`_
Extracted three helper functions to improve clarity and reduce duplication:

1. **validateScript()** - Centralizes all input validation (empty script, sleep
   commands, redundant cd). Returns error result or null.

2. **createLineHandler()** - Unified line processing logic for both stdout and
   stderr. Eliminates 84 lines of duplicate code. Enforces truncation limits
   consistently.

3. **formatResult()** - Handles all result formatting based on exit code and
   truncation state. Simplifies the main execution flow.

## Benefits

- **-80 LoC of duplication removed** (stdout/stderr handlers)
- **Clearer separation of concerns** - validation, processing, formatting
- **Easier to test** - helpers are pure functions
- **More maintainable** - single source of truth for line processing logic

## Testing

- All 955 unit tests pass
- Static checks pass (lint + typecheck + format)
- No behavior changes, pure refactor

---
_Generated with `cmux`_
- Replace readline + Node streams with direct Web Streams readers
- Read stdout/stderr in parallel; await exitCode + drains via Promise.all
- Deterministic line splitting with TextDecoder streaming (handles UTF-8 boundaries)
- Early cancel on hard limits; no arbitrary sleeps
- Simplifies code and resolves SSH close-event race

_Generated with `cmux`_
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant