Skip to content

Conversation

@Chriztiaan
Copy link
Contributor

@Chriztiaan Chriztiaan commented Nov 26, 2025

Summary

This PR fixes potential deadlocks in the shared sync implementation when using OPFS (Origin Private File System) with multiple browser tabs. The solution introduces a distributed database adapter that automatically handles connection reconfiguration when tabs close, eliminating the need for explicit reconnection logic in many scenarios.

The Problem

When using OPFS with multiple tabs, we use a shared web worker for processing sync events. The shared web worker shares a SQLite connection from one of the tabs. For OPFS, we have a connection per tab, and tabs connect to the shared sync worker. The shared sync worker requests a connection from a tab when needed.

Historically, we requested a connection from the last connected tab. When that tab closed, it would inform the shared sync worker (or a navigator lock would detect the tab closed), and we would then disconnect and reconnect using a DB connection from another tab. This approach worked in normal scenarios, but proved to be very racy when tabs were opening and closing at extremely fast rates. The race conditions could lead to deadlocks, especially in Safari with OPFS, where the timing of tab lifecycle events could cause the sync worker to get stuck waiting for a connection that would never respond.

The Solution

This PR removes the direct need for reconnection in many circumstances when a tab closes. Instead of the sync implementation managing connections directly, we now use a distributed DB adapter (LockedAsyncDatabaseAdapter) in the shared sync worker. This distributed adapter serves as the DBAdapter API surface for the sync implementation and handles reconfiguring the internal connection from another tab if the current tab closes.

DB reconnection is now transparent to the sync implementation. If a reconnect happens during an idle period of sync, the sync implementation will be completely unaware and can proceed normally. If an operation is attempted while reconnecting, the sync manager will detect the error and retry after a delay, just as it does for all errors. This eliminates many of the race conditions that occurred during rapid tab open/close scenarios and provides better error handling with improved error serialization for connection closed scenarios.

It's important to note that this PR does not yet address suspected frozen tab issues in Safari. While random connection selection helps avoid relying on potentially frozen tabs, additional work may be needed to fully handle Safari's tab freezing behavior.

Technical Implementation

The Distributed Database Adapter

At the heart of the solution is the LockedAsyncDatabaseAdapter, which was already in use and had pluggable ability for opening connections. We've now extended this adapter to support automatically reopening the database connection on specific errors. The adapter detects ConnectionClosedError and NoModificationAllowedError during operations and automatically re-opens the database connection from another available tab when needed. This provides a stable API surface that hides connection lifecycle details from the sync implementation, allowing it to operate as if it always has a reliable connection.

It's important to note that the distributed database adapter reconnect logic (reOpenOnConnectionClosed: true) is only used by the shared sync implementation. Single-tab implementations and other database adapters do not use this automatic reconnection feature, as they don't face the same multi-tab connection challenges.

Shared Sync Implementation

The shared sync implementation has been refactored to use LockedAsyncDatabaseAdapter instead of directly managing connections. The adapter is initialized with reOpenOnConnectionClosed: true to enable automatic reconnection, and connection selection is now randomized using getRandomWrappedPort() instead of always using the last port. This randomization is particularly important because in some cases, like Safari, tabs can be frozen by the browser, and random selection ensures we don't rely on a potentially frozen tab, improving reliability across different tab scenarios.

We've also tightened the setup flow for the shared sync worker. The sync manager now only uses tabs once the navigator lock-based close detection has been configured. This ensures that tabs which are opened and closed very quickly (closed before the lock could have been configured) will never be used by the sync implementation, since we can't reliably detect if they died. The client is only added to the SharedSyncImplementation after the navigator lock has been successfully registered with the shared worker, creating a safety gate that prevents dead tabs from entering the system.

Connection Management Improvements

The WorkerWrappedAsyncDatabaseConnection now properly handles unexpected remote closures. While markRemoteClosed() existed before to immediately abort pending operations when a tab closes, we've now added detection if the tab was closed while we were busy configuring or opening the database. This prevents operations from hanging indefinitely during the critical database initialization phase. We've also added better timeout handling for database connection operations with a 10-second timeout, and improved abort controller management for pending operations to ensure resources are properly cleaned up.

Additionally, we've now put the DB open operations in the same Navigator lock as used by the specific write/read operations. This prevents write conflicts when opening databases while another tab might be writing, ensuring that database initialization and operations are properly serialized across tabs.

We've also created a temporary fork of the OPFS Coop VFS. It appears that in some circumstances the initialization logic could grab hold of a navigator lock and not release it, which could contribute to deadlock scenarios. The fork addresses this potential lock retention issue.

Error Handling Enhancements

Error handling has been significantly improved throughout the system. Previously, some errors such as DOM exceptions were not serializable and were not reported from the shared sync worker to main tabs. We now handle these cases better when doing JSON serialization of errors in sync statuses, ensuring all errors can be properly transmitted across worker boundaries. Upload and download errors are now serialized for SyncStatus events, and we've implemented better error propagation and retry logic. Connection closed errors are now properly caught and trigger reconnection automatically.

Key Files Modified

The main changes are spread across several key files:

  • packages/web/src/worker/sync/SharedSyncImplementation.ts - Main sync implementation using distributed adapter
  • packages/web/src/db/adapters/LockedAsyncDatabaseAdapter.ts - Distributed adapter with auto-reconnection
  • packages/web/src/db/adapters/WorkerWrappedAsyncDatabaseConnection.ts - Improved remote closure handling
  • packages/web/src/worker/db/opfs.ts - Temporary fork of OPFS Coop VFS to fix potential navigator lock retention issues
  • packages/common/src/db/crud/SyncStatus.ts - Error serialization improvements
  • packages/common/src/client/sync/stream/AbstractStreamingSyncImplementation.ts - Retry logic improvements

Testing

This PR includes comprehensive test coverage to ensure the solution works correctly. We've added a multiple tabs test (multiple_tabs_iframe.test.ts) that tests multi-tab scenarios with rapid open/close operations. We've also created new mock sync service tests with testing utilities for isolated sync service testing, and added error serialization tests to ensure errors are properly serialized across worker boundaries. The existing multi-instance tests have been updated to use the new mock sync service.

…e edge cases where multiple tabs with OPFS/Safari can cause deadlocks.
@changeset-bot
Copy link

changeset-bot bot commented Nov 26, 2025

🦋 Changeset detected

Latest commit: 445ec69

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 8 packages
Name Type
@powersync/web Patch
@powersync/common Minor
@powersync/adapter-sql-js Patch
@powersync/diagnostics-app Patch
@powersync/node Patch
@powersync/op-sqlite Patch
@powersync/react-native Patch
@powersync/tanstack-react-query Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

async () => fn(this.generateDBHelpers({ execute: this._execute, executeRaw: this._executeRaw })),
{
timeoutMs: options?.timeoutMs
timeoutMs: options?.timeoutMs ?? this.options.defaultLockTimeoutMs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timeouts here are meant to help expose potentially held locks. E.g. in the past we've seen some DBAdapters created in the shared sync worker - which were not completely closed - could hold on to a lock indefinitely.
Without a timeout it just seems like the whole process is stuck. With a timeout, we'll at least get some indication of this.

// will also attempt to acquire it. Since the lock is returned when the tab is closed, this allows the share worker
// to free resources associated with this tab.
// We take hold of this lock as soon-as-possible in order to cater for potentially closed tabs.
getNavigatorLocks().request(`tab-close-signal-${crypto.randomUUID()}`, async (lock) => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initialisation flow here is meant to assist the Shared Sync worker to only use clients which have a navigator-lock based close detection established. We don't want to use clients which might have been registered, but closed before the lock has been configured.

};
}

protected serializeError(error?: Error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not all errors are serialisable over a MessagePort. E.g. some DomExceptions fail to be passed across workers. This explicitly serialises errors in the SyncStatus now.


return super.close({
// Don't disconnect by default if multiple tabs are enabled
disconnect: options.disconnect ?? !this.resolvedFlags.enableMultiTabs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default param above actually causes the opposite of this to occur. A close would perform a disconnect on web if no argument was supplied. Removing the default fixes this.

} catch (e) {
this.log?.(`failed to create access handles for ${file.path}`, e);
// Release the lock, if we failed here, we'd need to obtain the lock later in order to retry
file.persistentFile.handleLockReleaser();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only change in this file. When frequently refeshing a page, we noticed

[PowerSyncDatabase [example.dbll Error: Failed to execute
'createSyncAccessHandle'
on 'FileSystemFileHandle': Access Handles cannot be created if
there is another open Access Handle or Writable stream associated with the same file.

It seems like if this error happened during page load (not entirely sure why this error did happen at this stage), the lock would never be released - which would prevent other tabs from using the DB.

},
"pnpm": {
"overrides": {
"@journeyapps/wa-sqlite": "0.0.0-dev-20251201120934"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO remove this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants