chore(rivetkit): move conns to separate persisted kv keys #3420

NathanFlurry · 2025-11-09T03:07:54Z

No description provided.

vercel · 2025-11-09T03:07:57Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
rivetkit-serverless	Ready	Preview	Comment	Nov 13, 2025 2:48am

3 Skipped Deployments

Project	Deployment	Preview	Updated (UTC)
rivet-cloud	Ignored		Nov 13, 2025 2:48am
rivet-inspector	Ignored	Preview	Nov 13, 2025 2:48am
rivet-site	Ignored	Preview	Nov 13, 2025 2:48am

NathanFlurry · 2025-11-09T03:08:09Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

_{An organization admin has enabled the Graphite Merge Queue in this repository.} _{Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.}

This stack of pull requests is managed by Graphite. Learn more about stacking.

claude · 2025-11-09T03:10:02Z

Pull Request Review: Move Connections to Separate Persisted KV Keys

Summary

This PR refactors the actor persistence layer to store connection data in separate KV entries instead of bundling everything into a single actor state blob. This is a significant architectural improvement with good design choices.

Positive Highlights

Architecture & Design

Separation of concerns: Moving connection data to individual KV entries is a smart design decision that reduces write amplification when only connection state changes
Consistent key schema: The use of typed prefixes ([1] for actor data, [2] for connections) provides a clean namespace separation
Batch operations: New KV batch API (kvBatchPut, kvBatchGet, kvBatchDelete, kvListPrefix) is well-designed and supports efficient bulk operations
Change tracking: The #changedConnections set and per-connection hasChanges flag minimize unnecessary writes

Code Quality

Schema versioning: Proper use of versioned schemas (v2 for file-system-driver, v3 for actor-persist) with BARE encoding
Atomic operations: File system driver uses proper atomic writes with temp files and rename operations (lines 409-429 in global-state.ts)
Error handling: Good error handling with try-catch blocks and cleanup of temp files
Logging: Comprehensive structured logging throughout the changes

Issues & Concerns

🔴 Critical: Race Condition in Connection Removal

Location: rivetkit-typescript/packages/rivetkit/src/actor/instance.ts:1006-1023

#removeConn(conn: Conn<S, CP, CS, V, I, DB>) {
    // Remove from KV directly
    this.#removeConnFromKV(conn.id).catch((error) => {
        this.#rLog.error({
            msg: "failed to remove connection from KV",
            connId: conn.id,
            error: stringifyError(error),
        });
    });

    // Connection is now removed from KV storage only

    // Remove from state and tracking
    this.#connections.delete(conn.id);
    this.#changedConnections.delete(conn.id);

Problem: The KV deletion is fire-and-forget with .catch(). If the deletion fails, the connection is removed from memory but may still exist in storage, causing:

Stale connection data persisted indefinitely
Memory leaks when actor restarts and loads stale connections
Potential security issue if connections contain sensitive data

Recommendation: Make this async and await the deletion:

async #removeConn(conn: Conn<S, CP, CS, V, I, DB>) {
    // Remove from KV storage first
    try {
        await this.#removeConnFromKV(conn.id);
    } catch (error) {
        this.#rLog.error({
            msg: "failed to remove connection from KV",
            connId: conn.id,
            error: stringifyError(error),
        });
        throw error; // Or decide on proper error handling
    }

    // Only remove from memory after successful KV deletion
    this.#connections.delete(conn.id);
    this.#changedConnections.delete(conn.id);
    // ... rest of cleanup
}

🟡 Potential Issue: Missing Error Recovery

Location: rivetkit-typescript/packages/rivetkit/src/actor/instance.ts:647-684

The #writePersistedData method writes actor state and connections in a batch, but if the batch write fails partway through:

Some connections may be saved while others aren't
No rollback mechanism exists
Could lead to inconsistent state

Recommendation: Consider adding:

Transaction-like semantics or retry logic
Better error messaging to indicate which connections failed to save
Consider whether partial writes should fail the entire operation

🟡 Schema Naming Inconsistency

Location: rivetkit-typescript/packages/rivetkit/schemas/actor-persist/v3.bare

The schema types changed from PersistedActor/PersistedHibernatableConn to Actor/HibernatableConn, but the TypeScript interfaces still use the Persisted* naming convention. This creates confusion between the serialized schema types and the in-memory types.

Recommendation: Either:

Keep schema names consistent with TypeScript interfaces (PersistedActor)
Or update TypeScript to match schema (Actor → PersistedActor distinction)

🟡 Type Safety: Unused TextDecoder

Location: rivetkit-typescript/packages/cloudflare-workers/src/actor-driver.ts:146-155

async kvBatchPut(
    actorId: string,
    entries: [Uint8Array, Uint8Array][],
): Promise<void> {
    const storage = this.#getDOCtx(actorId).storage;
    const encoder = new TextDecoder(); // ⚠️ Never used

The encoder variable is declared but never used. Should be removed.

🟡 Performance: Missing Index for Prefix Searches

Location: rivetkit-typescript/packages/rivetkit/src/drivers/file-system/global-state.ts:844-874

The kvListPrefix implementation does a linear scan through all KV entries:

for (const kvEntry of entry.state.kvStorage) {
    const keyBytes = new Uint8Array(kvEntry.key);
    // Check if key starts with prefix
    if (keyBytes.length >= prefix.length) {
        let hasPrefix = true;
        for (let i = 0; i < prefix.length; i++) {
            if (keyBytes[i] !== prefix[i]) {
                hasPrefix = false;
                break;
            }
        }

Impact: O(n*m) complexity where n = number of keys, m = prefix length. For actors with many connections, this could be slow.

Recommendation:

Document expected connection counts
Consider adding a sorted index if connection counts are expected to be high (>100s)
Or accept the linear scan if connection counts are typically low (<50)

Minor Issues

Documentation

Missing JSDoc comments for new KV batch methods in ActorDriver interface
The makeConnKey function could use a comment explaining the key format
Schema migration path from v2→v3 should be documented

Code Style

Consider extracting the key comparison logic in kvListPrefix to a helper function for readability
The #uint8ArrayToKey and #keyToUint8Array methods in Cloudflare driver could be static utilities

Testing

No new test files added for the KV batch operations
Consider adding tests for:
- Connection persistence across hibernation
- Concurrent connection modifications
- KV prefix listing with various edge cases
- Connection removal cleanup

Performance Considerations

Positive Impact:

✅ Reduced write amplification: Only changed connections are written instead of entire actor state
✅ Batch operations reduce round trips to storage
✅ Change tracking minimizes unnecessary writes

Potential Concerns:

Increased KV operations count (one per connection instead of one per actor)
File system driver now writes multiple KV entries per actor (increases file size slightly)
Consider monitoring KV operation counts in production

Security Assessment

✅ No obvious security vulnerabilities introduced
✅ Proper input validation on keys and values
⚠️ Connection data cleanup on removal needs to be guaranteed (see critical issue above)
✅ No sensitive data appears to be logged

Recommendations

Must Fix (Before Merge)

Fix the race condition in #removeConn to ensure connections are properly cleaned up from KV storage

Should Fix

Add error recovery for partial batch write failures
Remove unused encoder variable in Cloudflare driver
Resolve schema naming inconsistency

Nice to Have

Add comprehensive tests for new KV operations
Document the migration path and backward compatibility strategy
Add JSDoc comments for the new public API methods
Consider adding metrics/logging for KV operation counts to monitor performance impact

Overall Assessment

This is a well-thought-out refactoring that improves the architecture by separating connection state from actor state. The code quality is generally high with good error handling and logging. The main concern is the fire-and-forget deletion in #removeConn which could lead to data leaks. Once that's addressed, this should be good to merge.

Recommendation: Request changes for the critical race condition fix, then approve after addressing.

pkg-pr-new · 2025-11-09T03:10:12Z

More templates

@rivetkit/actor

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/actor@3420

@rivetkit/cloudflare-workers

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/cloudflare-workers@3420

@rivetkit/core

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/core@3420

@rivetkit/db

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/db@3420

@rivetkit/framework-base

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/framework-base@3420

@rivetkit/next-js

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/next-js@3420

@rivetkit/react

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/react@3420

rivetkit

pnpm add https://pkg.pr.new/rivet-dev/rivet/rivetkit@3420

@rivetkit/sql-loader

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sql-loader@3420

@rivetkit/engine-runner

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner@3420

@rivetkit/engine-runner-protocol

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner-protocol@3420

commit: c57be74

claude · 2025-11-09T05:08:13Z

Code Review - PR #3420: Move conns to separate persisted kv keys

Summary

This PR refactors RivetKit's connection storage from embedding all connection data in a single persisted actor state to using separate KV entries per connection. This is a significant architectural improvement that enhances scalability and performance.

Strengths

Improved Architecture: Separating connections into individual KV entries is a smart design decision that:
- Reduces write amplification (don't need to rewrite all connections when one changes)
- Enables parallel connection updates
- Simplifies connection lifecycle management
Comprehensive Driver Updates: All actor drivers (Cloudflare Workers, File System, Engine) have been properly updated with the new KV batch API.
Change Tracking: The addition of change tracking for connections (#changed, hasChanges, markSaved()) is well-designed and prevents unnecessary writes.
Schema Versioning: Proper use of versioned schemas (actor-persist/v3.bare, file-system-driver/v2.bare) for backward compatibility.

Issues & Concerns

Critical Issues

Cloudflare Key Encoding Bug (actor-driver.ts:215-230)

The #uint8ArrayToKey method has a problematic assumption:

#uint8ArrayToKey(key: Uint8Array): string {
    // Check if this is a connection key (starts with [2])
    if (key.length > 0 && key[0] === 2) {
        const connId = new TextDecoder().decode(key.slice(1));
        return `${KEYS.CONN_PREFIX}${connId}`;
    }
    // Otherwise, treat as persist data key [1]
    return KEYS.PERSIST_DATA;
}

Problem: This assumes all non-connection keys are persist data keys. If any future key types are added (e.g., [3], [4]), they'll silently map to PERSIST_DATA, causing data corruption.

Fix: Add explicit validation:

if (key.length > 0 && key[0] === 2) {
    const connId = new TextDecoder().decode(key.slice(1));
    return `${KEYS.CONN_PREFIX}${connId}`;
} else if (key.length === 1 && key[0] === 1) {
    return KEYS.PERSIST_DATA;
} else {
    throw new Error(`Unknown key type: ${key[0]}`);
}

Missing TextDecoder Import (actor-driver.ts:148)

The code uses new TextDecoder() but I don't see an explicit import. While this works in browser/Cloudflare Workers environments, it's better to be explicit or use the existing encoder pattern.
Unused Variable (actor-driver.ts:148)
```
const encoder = new TextDecoder();
```
This variable is declared but never used in kvBatchPut. Should be removed.

Performance Concerns

Connection Restoration on Startup (instance.ts:402-432)

The current implementation loads all connections sequentially:
```
const connEntries = await this.#actorDriver.kvListPrefix(actorId, KEYS.CONN_PREFIX);
for (const [_key, value] of connEntries) {
    const conn = cbor.decode(value) as PersistedConn<CP, CS>;
    connections.push(conn);
}
```
For actors with many connections (100+), this could be slow. Consider:
- Parallel CBOR decoding with Promise.all
- Lazy loading connections on first access
- Connection pagination if extremely large numbers are expected
Write Amplification in File System Driver

Need to verify that the file system driver's kvBatchPut implementation uses atomic operations or proper locking to prevent partial writes if the process crashes mid-batch.

Code Quality Issues

Inconsistent Error Handling (actor-driver.ts:427-435)

In connection restoration, errors are logged but execution continues:

try {
    const conn = cbor.decode(value) as PersistedConn<CP, CS>;
    connections.push(conn);
} catch (error) {
    this.#rLog.error({
        msg: "failed to decode connection",
        error: stringifyError(error),
    });
}

Question: Should corrupted connections be silently skipped, or should this fail fast? Consider adding metrics/alerts for connection decode failures.

Magic Numbers (kv.ts:2-3)

The key prefixes use magic numbers:

PERSIST_DATA: Uint8Array.from([1]),
CONN_PREFIX: Uint8Array.from([2]),

Consider using an enum for clarity:

enum KeyPrefix {
    PERSIST_DATA = 1,
    CONN = 2,
}

Missing Type Guard (conn.ts:150-157)

The #setupPersistProxy method checks if persist can be proxied but doesn't have a proper type guard:
```
if (persist === null || typeof persist !== "object") {
    this.__persist = persist;
    return;
}
```
This could cause issues with primitives. Consider more robust checks.

Security Considerations

Connection ID Validation

Connection IDs from clients are used as KV keys. Ensure connId is properly validated to prevent:
- Path traversal attacks (if file system driver stores as files)
- Key collision attacks
- DOS via extremely long keys
Recommend adding validation in makeConnKey() to ensure connIds match expected format (UUID).
CBOR Decode Safety

Multiple places decode CBOR from storage without size limits:
```
const conn = cbor.decode(value) as PersistedConn<CP, CS>;
```
If storage is compromised, malicious CBOR could cause DOS. Consider adding size limits.

Testing Recommendations

Add tests for:
- Connection state persistence across actor restarts
- Concurrent connection modifications
- Large numbers of connections (100+)
- Connection decode failures and error recovery
- Key encoding/decoding edge cases in Cloudflare driver
Load testing: Test the new KV-based approach with realistic connection counts to ensure performance improvements are realized.
Migration testing: Verify that actors with existing embedded connections can migrate to the new format (or document that this is a breaking change requiring data migration).

Minor Issues

Logging Consistency: Some logs use connId, others use id. Standardize on one.
Documentation: The new KV methods in ActorDriver interface lack JSDoc comments explaining the key/value format and guarantees.
Type Safety: Several as type assertions could be replaced with proper runtime validation.

Questions

Is there a migration path for existing actors with embedded connection data?
What's the expected maximum number of connections per actor?
Should connection cleanup (for disconnected hibernatable connections) be part of this PR?
Are there any atomicity guarantees needed across multiple KV operations?

Verdict

Overall this is a well-architected improvement to RivetKit's persistence layer. The main concerns are:

The Cloudflare key encoding bug (critical - must fix)
Performance optimization opportunities for connection loading
Missing validation and error handling in key areas

Once the critical issues are addressed, this will be a solid enhancement.

🤖 Review generated with Claude Code

claude · 2025-11-13T01:34:08Z

Pull Request Review: Move Connections to Separate Persisted KV Keys

This PR introduces a significant architectural change by moving connection data from a single monolithic persisted data structure to separate KV keys. I've reviewed the changes and identified several areas for improvement along with some potential issues.

Overview

The PR changes the persistence model from storing all actor state (including connections) in a single PERSIST_DATA key to storing connections separately with individual keys prefixed by CONN_PREFIX (byte [2]). This is a solid architectural decision for scalability and performance.

Critical Issues

1. Cloudflare Driver Key Encoding Inconsistency

Location: rivetkit-typescript/packages/cloudflare-workers/src/actor-driver.ts:216-238

The #uint8ArrayToKey and #keyToUint8Array methods have potential bugs:

#uint8ArrayToKey(key: Uint8Array): string {
    if (key.length > 0 && key[0] === 2) {
        const connId = new TextDecoder().decode(key.slice(1));
        return `${KEYS.CONN_PREFIX}${connId}`;
    }
    return KEYS.PERSIST_DATA;
}

Issues:

The method assumes that if key[0] !== 2, it must be persist data ([1]), but doesn't validate this assumption
Connection IDs are decoded as UTF-8 strings, which could fail if the connection ID contains invalid UTF-8 sequences
No error handling for decode failures

Recommendation:

#uint8ArrayToKey(key: Uint8Array): string {
    if (key.length === 0) {
        throw new Error("Empty key provided");
    }
    
    if (key[0] === 2) {
        // Connection key - extract connId
        try {
            const connId = new TextDecoder("utf-8", { fatal: true }).decode(key.slice(1));
            return `${KEYS.CONN_PREFIX}${connId}`;
        } catch (error) {
            throw new Error(`Failed to decode connection ID: ${error}`);
        }
    } else if (key[0] === 1) {
        // Persist data key
        return KEYS.PERSIST_DATA;
    } else {
        throw new Error(`Unknown key prefix: ${key[0]}`);
    }
}

2. Schema Version Not Added to Build Script

Location: rivetkit-typescript/packages/rivetkit/package.json:156

The build script was updated to include file-system-driver/v2.bare, but I don't see corresponding updates to ensure backward compatibility with v1. When deploying this change:

Do existing actors with v1 schemas need migration?
Is there a migration path documented?
Will old actors automatically upgrade to v3?

Recommendation: Add migration documentation or upgrade logic if needed.

3. Potential Race Condition in Connection State Saving

Location: rivetkit-typescript/packages/rivetkit/src/actor/instance.ts (various locations)

The connection change tracking uses #changedConnections set, but there's no clear synchronization between:

Marking a connection as changed
Saving connections in batches
Clearing the changed flag

If saveState is called concurrently (even though there's a queue), there could be edge cases where:

A connection is modified during serialization
The markSaved() call happens before the actual KV write completes

Recommendation: Review the synchronization model and ensure the SinglePromiseQueue properly serializes all state mutations.

Medium Priority Issues

4. Missing Error Handling in Connection Decoding

Location: rivetkit-typescript/packages/rivetkit/src/actor/instance.ts:416-424

for (const [_key, value] of connEntries) {
    try {
        const conn = cbor.decode(value) as PersistedConn<CP, CS>;
        connections.push(conn);
    } catch (error) {
        this.#rLog.error({
            msg: "failed to decode connection",
            error: stringifyError(error),
        });
    }
}

Issue: When a connection fails to decode, it's silently skipped with only a log. This could lead to:

Lost connection state without user awareness
Orphaned KV entries
Inconsistent actor state

Recommendation:

Implement a recovery strategy (e.g., attempt to delete the corrupted key)
Optionally emit metrics/alerts for monitoring
Consider a "corrupted connections" counter to track data integrity issues

5. Unbounded KV List Operation

Location: rivetkit-typescript/packages/rivetkit/src/actor/instance.ts:407-410

const connEntries = await this.#actorDriver.kvListPrefix(
    this.#actorId,
    KEYS.CONN_PREFIX,
);

Issue: The kvListPrefix operation has no pagination or limits. For actors with many connections, this could:

Exceed platform limits (Cloudflare DO has limits on list operations)
Cause memory issues loading all connections at once
Slow down actor initialization

Recommendation:

Add pagination support to kvListPrefix
Consider lazy-loading connections on-demand
Document maximum recommended connections per actor

6. Type Safety in Schema Conversion

Location: Throughout the schema conversion code

The bare schema type definitions have been renamed (e.g., PersistedHibernatableConn → HibernatableConn in v3.bare), but the TypeScript types still use the old naming conventions. While this works, it creates confusion.

Recommendation: Consider aligning TypeScript type names with the schema names for consistency.

Low Priority / Nice to Have

7. Logging Inconsistencies

Some logs use this.#rLog (runtime log) and others use this.actorContext.log (user log)
The distinction isn't always clear when reviewing the code
Consider adding comments explaining when to use which logger

8. Code Duplication in Key Encoding

The key encoding logic appears in both:

rivetkit-typescript/packages/rivetkit/src/actor/kv.ts (makeConnKey)
rivetkit-typescript/packages/cloudflare-workers/src/actor-driver.ts (conversion methods)

These should ideally share the same encoding/decoding logic to prevent divergence.

9. Missing Test Coverage

Given the significant architectural change, I'd expect to see:

Tests for the new KV operations
Tests for connection persistence/restoration
Tests for the Cloudflare key encoding/decoding
Migration tests from the old format to new format

Recommendation: Add comprehensive test coverage before merging.

10. Performance Consideration

Moving to separate keys is good for isolation, but may impact performance:

Previously: 1 KV read to get all state
Now: 1 KV read for actor data + N KV reads for connections (though batched)

For actors with many connections, consider:

Benchmarking the new approach vs old
Documenting recommended connection limits
Potentially implementing connection pooling or caching strategies

Positive Notes

✅ Good architectural decision - Separating connection state from actor state improves isolation and allows for better scalability

✅ Proper use of batched KV operations - Using kvBatchPut, kvBatchGet, and kvBatchDelete minimizes round trips

✅ Change tracking - The onChange proxy pattern for tracking connection modifications is elegant

✅ Backward compatibility attempt - The schema versioning shows awareness of upgrade paths

Recommendations Summary

Before merging:

Fix the critical key encoding issues in Cloudflare driver
Add error recovery for corrupted connection data
Add pagination/limits to list operations
Add comprehensive test coverage
Document migration path and any breaking changes
Consider performance benchmarks with many connections

Future improvements:
7. Align TypeScript types with schema names
8. Centralize key encoding logic
9. Add monitoring/metrics for connection state errors
10. Document logger usage patterns

Overall, this is a solid refactoring that improves the architecture, but needs some hardening before production use.

claude · 2025-11-13T02:57:09Z

Code Review: Move conns to separate persisted KV keys

Overview

This PR refactors connection persistence in RivetKit by moving connection data from a single monolithic persisted state object to individual KV entries. This is a significant architectural improvement that addresses scalability and performance concerns.

Positive Aspects

1. Architecture Improvement ✅

Granular persistence: Connections are now stored as separate KV entries instead of being serialized within the main actor state. This allows for:
- More efficient updates (only changed connections are written)
- Better scalability with large numbers of connections
- Reduced serialization overhead on state saves

2. Clean Abstraction ✅

The new KV batch operations (kvBatchPut, kvBatchGet, kvBatchDelete, kvListPrefix) provide a consistent interface across drivers:

Engine driver (src/drivers/engine/actor-driver.ts:370-418)
Cloudflare Workers driver (src/actor-driver.ts:149-244)
File system driver (src/drivers/file-system/global-state.ts)

3. Change Tracking ✅

The proxy-based change tracking in conn.ts:139-196 is well-designed:

Uses onChange library to detect modifications
Validates CBOR serializability on state changes
Marks connections as dirty only when necessary
Prevents unnecessary writes

Issues & Concerns

1. Schema Versioning & Migration ⚠️

Critical: The schema changes from v2 to v3 rename types without providing migration logic:

In schemas/actor-persist/v3.bare:

-type PersistedHibernatableConn struct {
+type HibernatableConn struct {

Problem: Existing actors with v2 persisted data will fail to deserialize when they wake up after this deployment.

Recommendation:

Add migration logic in src/schemas/actor-persist/versioned.ts to handle the rename
OR keep the old type names for backward compatibility
Document the migration strategy in the commit message

Location: schemas/actor-persist/v3.bare:7-35

2. Error Handling & Data Consistency ⚠️

In src/actor/instance.ts:396-434, when restoring connections:

for (const [_key, value] of connEntries) {
    try {
        const conn = cbor.decode(value) as PersistedConn<CP, CS>;
        connections.push(conn);
    } catch (error) {
        this.#rLog.error({
            msg: "failed to decode connection",
            error: stringifyError(error),
        });
    }
}

Issues:

Failed connection decodes are silently skipped - this could lead to data loss
No metrics/alerting for corrupted connection data
Partial restoration could leave connections in inconsistent state with hibernatableConns array

Recommendation:

Add a counter for failed decodes and expose as metric
Consider failing actor startup if critical connections fail to decode
Log connection IDs to help with debugging

Location: src/actor/instance.ts:419-434

3. Cloudflare Driver Key Encoding 🔴

In cloudflare-workers/src/actor-driver.ts:216-251, the key encoding logic has assumptions:

#uint8ArrayToKey(key: Uint8Array): string {
    // Check if this is a connection key (starts with [2])
    if (key.length > 0 && key[0] === 2) {
        // Connection key - extract connId
        const connId = new TextDecoder().decode(key.slice(1));
        return `${KEYS.CONN_PREFIX}${connId}`;
    }
    // Otherwise, treat as persist data key [1]
    return KEYS.PERSIST_DATA;
}

Problems:

Hardcoded magic numbers: Uses 2 for connection prefix, 1 for persist data. These should reference the constants from kv.ts
Assumption fallback: The else case assumes anything that's not a connection key is persist data - this is fragile and could hide bugs
No validation: If an invalid key format is encountered, it silently returns the wrong key

Recommendation:

#uint8ArrayToKey(key: Uint8Array): string {
    if (key.length === 0) {
        throw new Error("Empty key");
    }
    
    if (key[0] === KEYS.CONN_PREFIX[0]) {
        const connId = new TextDecoder().decode(key.slice(1));
        return `${KEYS.CONN_PREFIX}${connId}`;
    }
    
    if (key[0] === KEYS.PERSIST_DATA[0]) {
        return KEYS.PERSIST_DATA;
    }
    
    throw new Error(`Unknown key prefix: ${key[0]}`);
}

Location: cloudflare-workers/src/actor-driver.ts:216-251

4. Race Condition in Connection Save ⚠️

In src/actor/instance.ts:1041-1075, the connection save logic:

// Save changed connections
for (const connId of this.#changedConnections) {
    const conn = this.#connections.get(connId);
    if (!conn) {
        this.#rLog.warn({
            msg: "connection marked as changed but not found",
            connId,
        });
        continue;
    }
    
    // ... save logic
}

Issue: If a connection is removed between when it's marked as changed and when the save happens, we get a warning but the KV entry might not be cleaned up.

Recommendation: Track deletion separately or ensure cleanup happens in the disconnect flow.

Location: src/actor/instance.ts:1041-1075

5. Missing File System Schema v2 ⚠️

The PR adds schemas/file-system-driver/v2.bare but I don't see:

Migration logic from v1 to v2
Updates to the versioned module to handle both versions

Recommendation: Ensure the file system driver properly handles schema versioning.

Location: schemas/file-system-driver/v2.bare:1-23

6. Code Duplication in Key Conversion

The Cloudflare driver has custom logic for converting between Uint8Array and string-based keys, while the engine driver works with Uint8Array natively. This creates:

Maintenance burden (logic must be kept in sync)
Potential for bugs if encoding/decoding doesn't match

Recommendation: Consider extracting key conversion to shared utilities.

Minor Issues

7. Unused Import

In src/drivers/engine/kv.ts:14, there appear to be removed imports that might leave unused dependencies.

8. Magic Number Documentation

The KV key prefixes ([1] for persist data, [2] for connections) should be documented more clearly. Consider adding comments explaining the key structure.

In src/actor/kv.ts:1-14:

export const KEYS = {
    PERSIST_DATA: Uint8Array.from([1]), // Main actor state
    CONN_PREFIX: Uint8Array.from([2]),  // Connection data prefix
};

9. Test Coverage

I don't see test updates in this PR. The new KV operations and connection persistence logic should have:

Unit tests for key encoding/decoding
Integration tests for connection restoration
Tests for edge cases (corrupted data, missing connections, etc.)

Performance Considerations

Positive

✅ Batching operations (kvBatchPut, kvBatchGet) reduces round trips
✅ Only saving changed connections reduces write amplification
✅ Debounced saves prevent excessive writes

Concerns

⚠️ kvListPrefix could be expensive with many connections - consider pagination or limits
⚠️ No obvious cleanup mechanism for orphaned connection KV entries

Security

No significant security concerns identified. The changes maintain the existing trust boundary and don't introduce new attack vectors.

Recommendations Priority

HIGH: Add schema migration logic for v2 → v3 transition
HIGH: Fix hardcoded magic numbers in Cloudflare driver key encoding
MEDIUM: Improve error handling for corrupted connection data
MEDIUM: Add test coverage for new KV operations
LOW: Document key structure and cleanup orphaned entries

Summary

This is a well-architected refactoring that improves scalability and performance. The main concerns are around:

Schema migration/backward compatibility
Error handling for edge cases
Magic number usage in the Cloudflare driver

With the suggested fixes, this will be a solid improvement to the codebase.

Status: Approve with requested changes

🤖 Generated with Claude Code

graphite-app · 2025-11-13T20:47:10Z

Merge activity

Nov 13, 8:47 PM UTC: NathanFlurry added this pull request to the Graphite merge queue.
Nov 13, 8:48 PM UTC: CI is running for this pull request on a draft pull request (#3463) due to your merge queue CI optimization settings.
Nov 13, 8:50 PM UTC: Merged by the Graphite merge queue via draft PR: #3463.

NathanFlurry mentioned this pull request Nov 9, 2025

chore(rivetkit): remove deprecated conn tokens & sse support #3419

Closed

vercel bot deployed to Preview – rivetkit-serverless November 9, 2025 03:09 View deployment

NathanFlurry force-pushed the 11-08-chore_rivetkit_move_conns_to_separate_persisted_kv_keys branch from ae73e44 to eb0e6f2 Compare November 9, 2025 05:06

NathanFlurry mentioned this pull request Nov 9, 2025

chore(rivetkit): split ActorInstance logic in to multiple classes #3421

Closed

vercel bot deployed to Preview – rivetkit-serverless November 9, 2025 05:08 View deployment

NathanFlurry marked this pull request as ready for review November 9, 2025 05:44

chore(rivetkit): move conns to separate persisted kv keys

c57be74

NathanFlurry force-pushed the 11-08-chore_rivetkit_remove_deprecated_conn_tokens_sse_support branch from 06d93f5 to e109ee5 Compare November 13, 2025 02:47

NathanFlurry force-pushed the 11-08-chore_rivetkit_move_conns_to_separate_persisted_kv_keys branch from d818375 to c57be74 Compare November 13, 2025 02:47

vercel bot deployed to Preview – rivetkit-serverless November 13, 2025 02:48 View deployment

This was referenced Nov 13, 2025

chore(rivetkit-typescript): remove dependency on node modules #3460

Merged

chore(rivetkit): switch dynamic node imports to use require #3461

Merged

chore(rivetkit): fix type checks #3462

Merged

This was referenced Nov 13, 2025

feat(gas): implement smart pulling for horizontally scaled workers #3417

Merged

fix(guard): handle websocket tasks during shutdown #3448

Merged

fix: improve sigterm handling for the entire runtime #3443

Merged

NathanFlurry mentioned this pull request Nov 13, 2025

feat(gateway): websocket hibernation #3400

Merged

This was referenced Nov 13, 2025

fix(pb): dont immediately reallocate actors on runner stopping #3413

Merged

fix(guard): dont send open msg for hibernating ws, hibernating ws keepalive #3449

Merged

NathanFlurry mentioned this pull request Nov 13, 2025

chore(gas): rename wake sub to bump sub #3408

Merged

MasterPtato approved these changes Nov 13, 2025

View reviewed changes

graphite-app bot pushed a commit that referenced this pull request Nov 13, 2025

chore(rivetkit): move conns to separate persisted kv keys (#3420)

4b04faf

graphite-app bot closed this Nov 13, 2025

graphite-app bot deleted the 11-08-chore_rivetkit_move_conns_to_separate_persisted_kv_keys branch November 13, 2025 20:50

This was referenced Nov 13, 2025

chore(rivetkit): implement new hibernating ws protocol #3464

Open

feat(cloudflare-workers): add createInlineClient #3466

Open

NicholasKissel mentioned this pull request Nov 15, 2025

chre: v4 website changes #3472

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore(rivetkit): move conns to separate persisted kv keys #3420

chore(rivetkit): move conns to separate persisted kv keys #3420

Uh oh!

NathanFlurry commented Nov 9, 2025

Uh oh!

vercel bot commented Nov 9, 2025 •

edited

Loading

Uh oh!

NathanFlurry commented Nov 9, 2025 •

edited by jog1t

Loading

Uh oh!

claude bot commented Nov 9, 2025

Uh oh!

pkg-pr-new bot commented Nov 9, 2025 •

edited

Loading

Uh oh!

claude bot commented Nov 9, 2025

Uh oh!

claude bot commented Nov 13, 2025

Uh oh!

claude bot commented Nov 13, 2025

Uh oh!

graphite-app bot commented Nov 13, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chore(rivetkit): move conns to separate persisted kv keys #3420

chore(rivetkit): move conns to separate persisted kv keys #3420

Uh oh!

Conversation

NathanFlurry commented Nov 9, 2025

Uh oh!

vercel bot commented Nov 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NathanFlurry commented Nov 9, 2025 • edited by jog1t Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How to use the Graphite Merge Queue

Uh oh!

claude bot commented Nov 9, 2025

Pull Request Review: Move Connections to Separate Persisted KV Keys

Summary

Positive Highlights

Architecture & Design

Code Quality

Issues & Concerns

🔴 Critical: Race Condition in Connection Removal

🟡 Potential Issue: Missing Error Recovery

🟡 Schema Naming Inconsistency

🟡 Type Safety: Unused TextDecoder

🟡 Performance: Missing Index for Prefix Searches

Minor Issues

Documentation

Code Style

Testing

Performance Considerations

Security Assessment

Recommendations

Must Fix (Before Merge)

Should Fix

Nice to Have

Overall Assessment

Uh oh!

pkg-pr-new bot commented Nov 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude bot commented Nov 9, 2025

Code Review - PR #3420: Move conns to separate persisted kv keys

Summary

Strengths

Issues & Concerns

Critical Issues

Performance Concerns

Code Quality Issues

Security Considerations

Testing Recommendations

Minor Issues

Questions

Verdict

Uh oh!

claude bot commented Nov 13, 2025

Pull Request Review: Move Connections to Separate Persisted KV Keys

Overview

Critical Issues

1. Cloudflare Driver Key Encoding Inconsistency

2. Schema Version Not Added to Build Script

3. Potential Race Condition in Connection State Saving

Medium Priority Issues

4. Missing Error Handling in Connection Decoding

5. Unbounded KV List Operation

6. Type Safety in Schema Conversion

Low Priority / Nice to Have

7. Logging Inconsistencies

8. Code Duplication in Key Encoding

9. Missing Test Coverage

10. Performance Consideration

Positive Notes

Recommendations Summary

Uh oh!

claude bot commented Nov 13, 2025

Code Review: Move conns to separate persisted KV keys

Overview

Positive Aspects

1. Architecture Improvement ✅

2. Clean Abstraction ✅

3. Change Tracking ✅

vercel bot commented Nov 9, 2025 •

edited

Loading

NathanFlurry commented Nov 9, 2025 •

edited by jog1t

Loading

pkg-pr-new bot commented Nov 9, 2025 •

edited

Loading

graphite-app bot commented Nov 13, 2025 •

edited

Loading