Skip to content

Conversation

@jspanchu
Copy link
Contributor

  • closes using emdawnwebgpu port shows file lock error #24609
  • The WindowsFileLock class already has a suppression in place around os.remove. This commit adds a similar supression for UnixFileLock
  • These errors are typically raised when another instance of emcc released the lock file. This is a common pattern in the build systems of large projects. Ex: ninja -j32

@jspanchu
Copy link
Contributor Author

@kripken @sbc100 please review

- closes  emscripten-core#24609
- The `WindowsFileLock`  class already has a suppression in place around `os.remove`. This commit adds a similar supression for `UnixFileLock`
- These errors are typically raised when another instance of `emcc` released the lock file. This is a common pattern in the build systems of large projects. Ex: `ninja -j32`
@jspanchu jspanchu force-pushed the workaround-race-filelock-unix branch from d313296 to 7ef387b Compare November 11, 2025 11:36
with suppress(FileNotFoundError):
os.unlink(self._lock_file)
with suppress(OSError):
fcntl.flock(fd, fcntl.LOCK_UN)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are both of these needed to fix your issue? Or is just one or the other enough?

It seems like neither should be necessary, since if we are holding the lock we should always be able to unlock it, right? It should be impossible for another process acquire or delete the lock file until LOCK_UN, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The os.unlink is what I observed to cause trouble (also seen in the issue linked in PR description). The other one, should not be necessary.

It seems like neither should be necessary, since if we are holding the lock we should always be able to unlock it, right? It should be impossible for another process acquire or delete the lock file until LOCK_UN, no?

I would think so, but in practice, os.unlink gets executed after someone else deletes it. Could it be an issue with the port, and not the filelock.py?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think so, but in practice, os.unlink gets executed after someone else deletes it.

Im struggling to see how anyone else could be deleting this file. But clearly it is happening, I just want to know how / why

@sbc100
Copy link
Collaborator

sbc100 commented Nov 11, 2025

We do a lot of building with of all our library with a lot of parallelism, but I don't think i've run into this issue locally or in our CI.

For example, I have 128 cores and often rebuild the library cache from scratch I've not run into this. I will try to build a reproducer.

@sbc100
Copy link
Collaborator

sbc100 commented Nov 11, 2025

I wrote a test to try an repro this but so far its not reproducing: #25772

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

using emdawnwebgpu port shows file lock error

2 participants