Skip to content

Conversation

@NathanFlurry
Copy link
Member

No description provided.

@vercel
Copy link

vercel bot commented Nov 10, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
rivetkit-serverless Ready Ready Preview Comment Nov 13, 2025 8:55pm
3 Skipped Deployments
Project Deployment Preview Comments Updated (UTC)
rivet-cloud Ignored Ignored Nov 13, 2025 8:55pm
rivet-inspector Ignored Ignored Preview Nov 13, 2025 8:55pm
rivet-site Ignored Ignored Preview Nov 13, 2025 8:55pm

This was referenced Nov 10, 2025
Copy link
Member Author

NathanFlurry commented Nov 10, 2025


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@pkg-pr-new
Copy link

pkg-pr-new bot commented Nov 10, 2025

More templates

@rivetkit/actor

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/actor@3439

@rivetkit/cloudflare-workers

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/cloudflare-workers@3439

@rivetkit/core

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/core@3439

@rivetkit/db

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/db@3439

@rivetkit/framework-base

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/framework-base@3439

@rivetkit/next-js

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/next-js@3439

@rivetkit/react

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/react@3439

rivetkit

pnpm add https://pkg.pr.new/rivet-dev/rivet/rivetkit@3439

@rivetkit/sql-loader

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sql-loader@3439

@rivetkit/engine-runner

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner@3439

@rivetkit/engine-runner-protocol

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner-protocol@3439

commit: afc9c8e

@claude
Copy link

claude bot commented Nov 10, 2025

PR Review: Fix Race Condition with WebSocket Open Events

Summary

This PR addresses a critical race condition in WebSocket connection handling by splitting connection lifecycle into two phases: prepareConn (async) and connectConn (sync). This is an important architectural improvement that prevents WebSocket messages from being dropped.

Code Quality & Architecture

✅ Excellent Architectural Changes

Connection Lifecycle Split

  • The separation of prepareConn (async) and connectConn (sync) is well-designed
  • Lines 76-133 in connection-manager.ts: prepareConn handles all async operations (auth, state creation)
  • Lines 143-182 in connection-manager.ts: connectConn is intentionally synchronous with clear documentation explaining why
  • This pattern is correctly applied in router-endpoints.ts:138-152 where WebSocket handlers call these in sequence

Driver Pattern Improvements

  • WebSocket drivers now use a lazy initialization pattern with setWebSocket() method
  • createWebSocketSocket and createRawWebSocketSocket return { driver, setWebSocket } tuple
  • Driver operations safely handle undefined websocket with appropriate logging
  • Lines 31-36, 96-101 in websocket.ts: Guards against operations before websocket is set

🎯 Good Practices

  1. Connection State Tracking: New CONN_CONNECTED_SYMBOL symbol properly tracks connection lifecycle
  2. Type Safety: OnConnectContext now properly includes all generic parameters (lines 7-16 in on-connect.ts)
  3. Manager Visibility: Changed connectionManager and eventManager to public properties for proper access patterns
  4. Error Handling: Maintained proper error logging and graceful degradation throughout

Potential Issues & Concerns

⚠️ Race Condition in connectConn

Critical Issue (Lines 143-182 in connection-manager.ts):

connectConn(conn: Conn<S, CP, CS, V, I, DB>) {
    invariant(!this.#connections.has(conn.id), "conn already connected");
    
    this.#connections.set(conn.id, conn);
    this.#changedConnections.add(conn.id);
    this.#callOnConnect(conn);  // ← User code runs here
    // ...
    conn[CONN_CONNECTED_SYMBOL] = true;  // ← Set AFTER onConnect
    conn[CONN_SEND_MESSAGE_SYMBOL](initMessage);  // ← Init sent AFTER onConnect
}

Problem: CONN_CONNECTED_SYMBOL is set to true AFTER onConnect is called, but conn.send() checks this flag (line 201 in conn/mod.ts):

send(eventName: string, ...args: unknown[]) {
    this.#assertConnected();  // ← Will throw if called in onConnect!
    // ...
}

Impact: If user code in onConnect tries to send messages via conn.send(), it will throw an error saying "Connection not connected yet."

Recommendation: Move conn[CONN_CONNECTED_SYMBOL] = true; to BEFORE this.#callOnConnect(conn); at line 150.

🔍 Type Safety Concern

In router-endpoints.ts:177:

const conn = await actor.connectionManager.prepareAndConnectConn(
    createHttpSocket(),
    undefined as unknown as CP,  // ← Type cast
    undefined,
);

Issue: Using as unknown as CP bypasses type safety for HTTP action connections. While this may be intentional (HTTP actions might not need connection params), consider:

  1. Making CP optional in the connection type system for action-only actors
  2. Adding a comment explaining why this cast is safe
  3. Creating a dedicated method like prepareActionConn that doesn't require params

⚠️ TODO Comments

Several TODOs indicate incomplete work:

  1. Line 158 in connection-manager.ts: // TODO: Only do this for action messages

    • Init message is always sent, even for non-action connections
    • May cause unnecessary overhead or protocol confusion
  2. Line 176 in connection-manager.ts: // TODO: This may cause issues

    • Unclear what issues are anticipated
    • Should be investigated before merging

🤔 Minor Concerns

Optional Chaining on Terminate (Lines 51-52 in raw-websocket.ts, line 111 in websocket.ts):

(websocket as any)?.terminate?.();
  • Mixing as any with optional chaining suggests uncertainty about the interface
  • Consider defining a proper interface for websockets with optional terminate method

Inconsistent Error Handling:

  • Some places use loggerWithoutContext() (lines 37-40 in raw-websocket.ts)
  • Others use actor.rLog (lines 32-36 in websocket.ts)
  • Consider standardizing on one approach for consistency

Security Considerations

No security concerns identified

  • Proper validation maintained throughout
  • Error messages don't leak sensitive information
  • Connection lifecycle properly managed

Performance Considerations

✅ Improvements

  • Splitting async/sync phases reduces blocking in critical WebSocket event handlers
  • Lazy websocket initialization prevents unnecessary object creation

⚠️ Potential Issue

  • Init message sent on every connection (line 161-181 in connection-manager.ts) even for connections that don't need it
  • Consider conditional sending based on connection type

Test Coverage

📊 Existing Coverage

  • Driver test suite exists with comprehensive fixture in raw-websocket.ts
  • File system driver tests updated to handle new structure
  • Fixed getRequestInfo test case (lines 53-65 in raw-websocket.ts)

🔴 Missing Coverage

Critical: No specific tests for the race condition being fixed

  • Should add test that sends messages immediately in onConnect callback
  • Should add test that calls broadcast immediately in onConnect
  • Should add test for concurrent WebSocket open events
  • Should verify message ordering with rapid-fire messages after connection

Recommendation: Add test case like:

test("onConnect can send messages immediately", async () => {
    const actor = createActor({
        onConnect: async (ctx, conn) => {
            conn.send("welcome", { message: "hello" });
        }
    });
    // Connect and verify message received
});

Recommendations

Must Fix Before Merge

  1. Fix CONN_CONNECTED_SYMBOL ordering - Move flag set before onConnect call
  2. Clarify type cast in HTTP action connection preparation
  3. Address TODOs or document why they can be deferred

Should Consider

  1. Add specific test coverage for the race condition
  2. Standardize error logging approach
  3. Define proper TypeScript interface for websocket with optional terminate
  4. Make init message sending conditional on connection type

Nice to Have

  1. Add JSDoc comments explaining the two-phase connection pattern
  2. Consider extracting connection state machine into a dedicated class
  3. Add performance benchmarks for connection establishment

Overall Assessment

Quality: ⭐⭐⭐⭐ (4/5)

  • Solid architectural improvement
  • Well-documented reasoning for sync/async split
  • Maintains backward compatibility

Risk Level: 🟡 Medium

  • One critical bug (CONN_CONNECTED_SYMBOL ordering)
  • Several TODOs indicate incomplete work
  • Limited test coverage for the specific race condition

Recommendation: Request changes - fix the CONN_CONNECTED_SYMBOL ordering issue and address the type safety concern before merging.


Great work on identifying and addressing this race condition! The two-phase connection pattern is a clean solution. Just needs a few tweaks before it's ready to ship. 🚀

@jog1t jog1t changed the base branch from 11-09-fix_rivetkit_standardize_error_metadata_return_type to graphite-base/3439 November 13, 2025 19:02
@jog1t jog1t changed the base branch from graphite-base/3439 to main November 13, 2025 19:03
NathanFlurry and others added 16 commits November 13, 2025 12:51
- Created StateManager class for connection state management
- Moved state proxying and change tracking to dedicated manager
- Changed conn methods to use symbols for internal access:
  - actor, stateEnabled, persistRaw (getters)
  - hasChanges(), markSaved(), sendMessage() (methods)
- Updated all references across codebase to use new symbols
- Maintains backward compatibility for public API (send, disconnect, etc)
- Consistent naming pattern with instance/mod.ts StateManager

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@NathanFlurry NathanFlurry changed the base branch from main to graphite-base/3439 November 13, 2025 20:53
@NathanFlurry NathanFlurry force-pushed the 11-10-fix_rivetkit_fix_race_condition_with_websocket_open_events branch from e8b192b to afc9c8e Compare November 13, 2025 20:53
@claude
Copy link

claude bot commented Nov 13, 2025

PR Review: Fix Race Condition with WebSocket Open Events

Summary

This PR addresses a critical race condition in the WebSocket connection lifecycle by splitting connection preparation and connection phases. The changes ensure that WebSocket messages aren't dropped during the connection setup process.

Key Changes

1. Connection Lifecycle Split ✅

Good: The separation of prepareConn() and connectConn() is a solid architectural improvement that addresses the race condition.

  • prepareConn() - Handles async pre-connection work (auth, state creation)
  • connectConn() - Synchronously connects the connection (must be called in onOpen)

Why this matters: WebSocket onOpen handlers need to be synchronous to maintain proper event ordering. Moving async work to prepareConn() prevents race conditions where messages could be processed before the connection is fully initialized.

2. Deferred WebSocket Assignment Pattern ✅

The changes to createWebSocketSocket() and createRawWebSocketSocket() to return { driver, setWebSocket } are well-designed:

const { driver, setWebSocket } = createWebSocketSocket(...);
const conn = await actor.connectionManager.prepareConn(driver, ...);
// ... later in onOpen (synchronous)
setWebSocket(ws);
actor.connectionManager.connectConn(conn);

This ensures the driver exists before the WebSocket is ready, preventing null/undefined access issues.

3. Connection State Guards ✅

The addition of CONN_CONNECTED_SYMBOL and #assertConnected() provides proper lifecycle validation:

// In conn/mod.ts:65-70
#assertConnected() {
    if (!this[CONN_CONNECTED_SYMBOL])
        throw new InternalError(
            "Connection not connected yet. This happens when trying to use the connection in onBeforeConnect or createConnState.",
        );
}

Good defensive programming that will help catch misuse early.

Potential Issues

1. Missing Guard in disconnect() Method ⚠️

In conn/drivers/websocket.ts:91-101 and raw-websocket.ts:34-42, the disconnect() method checks for !websocket and logs a warning but doesn't prevent the disconnect flow:

if (!websocket) {
    loggerWithoutContext().warn("disconnecting ws without websocket");
    return;  // Good: returns early
}

However, there's a potential edge case: if prepareConn() is called but connectConn() is never called (e.g., error during WebSocket upgrade), and then cleanup tries to disconnect, we could have a driver without a WebSocket. The current code handles this correctly by returning early, but consider:

Suggestion: Add similar guards in other driver methods that access websocket to ensure robustness.

2. OnConnect Context Type Change 🔍

The OnConnectContext type was changed from extending ConnInitContext to ConnContext (contexts/on-connect.ts:11). This gives onConnect access to the full connection object including conn.send().

Question: Is this intentional? The old type prevented accessing connection methods during onConnect, which could be a safeguard. Now users can call conn.send() in onConnect, which should work since CONN_CONNECTED_SYMBOL is set before calling onConnect.

Verify: Ensure this doesn't introduce timing issues where messages sent in onConnect race with the Init message sent immediately after in connectConn() (connection-manager.ts:158-181).

3. Error Handling in connectConn() ⚠️

The connectConn() method (connection-manager.ts:143-182) is synchronous and calls #callOnConnect() which can throw or return a rejected promise. If onConnect throws synchronously, the connection will be in a partially connected state (already added to #connections map).

Current code:

connectConn(conn: Conn<S, CP, CS, V, I, DB>) {
    invariant(!this.#connections.has(conn.id), "conn already connected");
    this.#connections.set(conn.id, conn);  // <- Added to map
    this.#changedConnections.add(conn.id);
    this.#callOnConnect(conn);  // <- Could throw
    // ... rest of method
}

Recommendation: Wrap the entire body in a try-catch and ensure cleanup happens if any step fails after adding to the connections map.

4. Raw WebSocket Request Info Implementation ✅

The fix in fixtures/driver-test-suite/raw-websocket.ts:53-65 properly handles the getRequestInfo message:

const url = ctx.request?.url || "ws://actor/websocket";

Good fallback handling. Consider documenting when ctx.request might be undefined.

5. Typo in Comment 📝

connection-manager.ts:136: "form" should be "from"

// Adds a connection form prepareConn to the actor

Security Considerations

No significant security concerns identified. The changes are primarily architectural improvements to the connection lifecycle.

Performance Considerations

Positive Impact ✅

  • Splitting async/sync phases should slightly improve WebSocket connection latency by reducing the critical path in onOpen
  • Deferred WebSocket assignment eliminates potential waiting/polling overhead

Neutral

  • Additional CONN_CONNECTED_SYMBOL checks add minimal overhead
  • No significant performance regressions expected

Test Coverage

The PR includes changes to the raw-websocket fixture test actor, which is good. However:

Recommendations:

  1. Add specific tests for the race condition this fixes (if not already present)
  2. Test error cases: what happens if connectConn() is called twice? What if prepareConn() succeeds but connectConn() is never called?
  3. Test the new assertConnected() guard - ensure it properly rejects operations before connection is complete

Code Quality

Positive ✅

  • Clear separation of concerns
  • Good use of TypeScript symbols for internal state
  • Comprehensive logging at debug/warn/error levels
  • Consistent error handling patterns

Minor Issues

  • Some long functions could be broken down (e.g., connectConn() does many things)
  • The TODO comment on line 158 should be tracked: "// TODO: Only do this for action messages"

Recommendations

  1. High Priority: Add error handling for failures in connectConn() after the connection is added to the map
  2. Medium Priority: Verify the OnConnectContext type change doesn't introduce message ordering issues
  3. Low Priority: Fix typo, consider breaking down large methods
  4. Testing: Add specific tests for the race condition and error paths

Overall Assessment

Verdict: ✅ Approve with minor recommendations

This is a solid architectural improvement that addresses a real race condition. The core solution (splitting prepare/connect phases) is well-designed and implemented. The potential issues identified are mostly edge cases that should be addressed but don't block the PR.

The code follows the repository's patterns (TypeScript, structured logging, error handling) and improves the overall robustness of the connection management system.


🤖 Generated with Claude Code

@NathanFlurry NathanFlurry changed the base branch from graphite-base/3439 to main November 13, 2025 21:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants