[CLI] Move native file locking into workers #2997

brandonpayton · 2025-12-08T22:05:26Z

Motivation for the change, related issues

In order to fix native file locking in Windows, we are moving native file locking into workers.

More details coming...

Implementation details

TBD

Testing Instructions (or ideally a Blueprint)

TBD

…Windows FileLockManager tests

brandonpayton · 2025-12-09T05:33:28Z

We can do this with a FileLockManagerForPosix and a FileLockManagerForWindows and fall back to the FileLockManagerForNode if the native locking API is not available.

I've working on the implementation for both. The main things that require care are:

Treat zero-length ranges as extending to the end of the file
Release locks when a related file descriptor is closed
Release locks when the process exits (or in this case, when a PHP request is completed)
Implement fcntl() semantics
- Release fcntl() locks when any file descriptor for the locked file is closed by the locking process.
- Locked ranges can be unlocked or merged piece by piece. In contrast, Windows locking requires that an unlock range corresponds exactly to the locked range.

For Posix, we can keep it fairly simple for fcntl() by keeping track of which files a process has locked via fcntl() and then unlocking the entire range via fcntl() when locks need to be released.

For Windows, implementing fcntl() semantics is more complicated. We'll have to maintain a collection of which ranges are locked per file in order to be able to unlock those ranges. If a caller wants to unlock part of a range, we'll have to unlock the entire range and then obtain locks for the remaining portions of the original locked range. For shared locks, we can obtain the new ranges before releasing the original range, but for exclusive ranges, we'll have to release the original range before attempting to obtain locks on the remaining ranges. (According to a Google answer about whether Windows allows overlapping exclusive locks by the same process)

The good news is that we are already tracking locked ranges in the FileLockManagerForNode. The work for the Windows locks shouldn't be that different.

cc @adamziel

…Posix

…ire remaining range

…and process exit

brandonpayton · 2025-12-10T06:03:14Z

I roughed out native FileLockManager's for POSIX and Windows, but they are yet untested. Tomorrow, I plan to start by adapting native locking tests and testing these new classes.

adamziel · 2025-12-10T17:01:57Z

The pre-requisites for this one seem to be mostly in place. The CLI spawn handler now creates a new OS process for any spawned PHP subprocess. The request handler still uses multiple PHP instances, but can be tuned down by adding maxPhpInstances: 1, to every bootWordPress* call in worker-thread-v*.ts files – which we could do in this PR.

In #3014, I'm exploring a CI stress test to confirm multiple workers are indeed used for handling concurrent requests.

brandonpayton · 2025-12-16T00:57:24Z

@adamziel, all the FileLockManager tests are passing for the Windows version. I explicitly disabled the tests that assert partial lock/unlock and overlapping locked ranges. Those things aren't supported for Windows, and I'm not sure we'll be able to support that safely on Windows. Regardless, I don't think we'll need it to support SQLite's locking scheme.

I think we might be able to test Playground CLI with the Windows lock integration before moving to separate worker processes, so I'm planning to try that yet tonight. It may give us a little more info.

Either way, these Windows tests make me think it's worth trying the multi-process approach here.

brandonpayton · 2025-12-17T06:37:24Z

@adamziel I was able to get Playground CLI booting with native file locking in Windows. With a very simple test it appears to be working well.

Tomorrow, I plan to prototype moving Playground CLI from multi-worker to multi-process. This is needed if we want to rely exclusively on native OS file locking because locks are per-fd and per-process. Otherwise the native OS will see locks held by php-wasm workers as locks held by the same process.

adamziel · 2025-12-17T11:37:03Z

Amazing @brandonpayton!

adamziel · 2025-12-17T15:22:16Z

Btw, I thought Playground CLI was already multi-process and we just needed to cap the number of workers to 1?

The purpose of this is to eliminate worker args that are not serializable as JSON so we can move from workers to separate processes which communicate via JSON-serialized messages rather than via transferred or cloned objects.

brandonpayton · 2025-12-18T04:57:48Z

Btw, I thought Playground CLI was already multi-process and we just needed to cap the number of workers to 1?

@adamziel we've just been multi-worker-thread up to now. Last week, I realized that I'd thought we were multi-process already but was incorrect (mentioned here). But I think we are close to moving to multiple processes. I see how it can work using the Node.js cluster package and am working out the details. (I think we're very close) The cool thing is that we won't need to pass the response from the worker process back to the main... the worker will get the connection's socket and can respond directly.

brandonpayton · 2025-12-18T17:02:40Z

I have what I think should be working for a Playground CLI process cluster but need to wire up spawning a another PHP worker from the child process (for proc_open() I think). For that case, we probably just want to fork() a child process rather than fork() a process within the cluster. Will see.

adamziel · 2025-12-18T18:28:04Z

You know all that, but I saw fork() and just wanted to say it out loud: fork() in Node.js !== fork() in unix. Also, it will give you a new Node.js process rather than a worker.

brandonpayton · 2025-12-18T20:06:04Z

You know all that, but I saw fork() and just wanted to say it out loud: fork() in Node.js !== fork() in unix. Also, it will give you a new Node.js process rather than a worker.

Thanks for stating this explicitly!

😄 I just discovered cluster.fork() is more like POSIX fork() than child_process.fork(). By default cluster.fork() at least creates a child that runs the main-thread/primary script unless you explicitly override the child script path with cluster.setupPrimary({ exec: 'path-to-child-script' }) before calling fork().

In that case, it's like you're saying "actually this other script is the 'primary' you should fork() from". But it's not actually the primary script. It's just the script you want to run in a cluster. 🙃

brandonpayton · 2025-12-18T20:24:31Z

@adamziel @mho22 I pushed some WIP changes for running Playground CLI php-wasm workers as separate processes. Some notes:

It uses the Node.js cluster package to create worker processes.
- Each worker initiates listening on the Playground CLI web server port.
- cluster allows multiple cluster workers to listen on the same port.
- On non-Windows, Node.js attempts to load balance with a round-robin approach.
- On Windows, the first process that accepts a request services that request. According to Node.js cluster docs, this may change once libuv is able to share IOCP (I/O Completion Ports) handles efficiently on Windows.
- express is no longer used.
- The main Playground CLI process no longer services HTTP requests. Requests are completely handled by worker processes.
It is currently working for Blueprints v1 only, though supporting v2 should be straightforward. In the meantime Blueprints v2 is broken within this PR.
When spawning a worker for things like proc_open(), we use child_process.fork() because those processes are not part of the web server process cluster. They run PHP but do not handle incoming HTTP requests.
There is an IPC error printed to the console after we kill the initial worker used for WordPress setup. We just need to release the comlink handle more gracefully.
You still need to pass --experimental-multi-worker to get multiple workers.
My subjective user experience: For an instant, it feels like the server is slow to start responding, and then everything is lightning fast.

This is all that is coming to me now, but I'll likely follow up and share more soon. :)

In the meantime, I'm working on:

Trying this in Windows
Tons of cleanup and reflection
Running more of the Playground CLI automated tests
Fixing type and lint errors

brandonpayton · 2025-12-18T20:55:14Z

Actually, the push failed. Fixing...

The purpose of this is to eliminate worker args that are not serializable as JSON so we can move from workers to separate processes which communicate via JSON-serialized messages rather than via transferred or cloned objects.

brandonpayton · 2025-12-18T21:01:51Z

The multi-process Playground CLI changes are now pushed.

brandonpayton · 2025-12-18T21:26:50Z

It's kinda wild, but multi-process Playground CLI worked with no modifications on Windows. 🎉 That's great (and unusual)!

I instrumented FileLockManagerForWindows to log when locking file byte ranges, and it is being used to lock the SQLite DB.

brandonpayton · 2025-12-19T00:20:56Z

Currently IPC is JSON-based, but I just learned there is an "advanced" serialization option based on the HTML structured clone algorithm. It may not matter either way, but being able to pass instances of builtin objects is a blessing.

brandonpayton · 2025-12-19T00:21:45Z

Also, IPC via comlink isn't used much after startup. After that, the workers are just servicing HTTP requests directly.

brandonpayton · 2025-12-19T05:36:30Z

The Playground CLI file locking tests are passing. There are still other Playground CLI test failures, and I'm looking at those.

brandonpayton · 2025-12-19T07:09:38Z

The main Playground run-cli tests for Blueprints v1 are passing except for:

   ❯ other run-cli behaviors (2) 2842ms
     ❯ auto-login (1) 1074ms
       × should clear old auto-login cookie 1074ms
     ❯ error handling (1) 1768ms
       × should return 500 when the request handler throws an error 1768ms

The biggest thing that was causing failure was that runCLI() was being used with a single worker process which now only has a single PHP instance. Some tests needed more than a single instance. Switching to explicit multi-worker-process fixed them.

Ultimately, to land this PR, we'll need to always run multiple php-wasm worker processes.

I'm pleased with how well things appear to be going. Tomorrow, I'll continue cleaning this up and fixing tests.

brandonpayton · 2025-12-19T07:11:41Z

One note:
The autologin cookie cleanup testing is failing because that cookie cleanup isn't happening at the moment. That needs to be fixed.

It's a bit more awkward because, with a cluster of workers, there is no one place that can judge which request is the first (so we know the autologin-has-happened cookie cannot be from the current Playground CLI session).

adamziel · 2025-12-19T11:59:04Z

Ultimately, to land this PR, we'll need to always run multiple php-wasm worker processes.

Sounds good and makes sense. Should we have a lower bound on the number of worker processes?

It's a bit more awkward because, with a cluster of workers, there is no one place that can judge which request is the first (so we know the autologin-has-happened cookie cannot be from the current Playground CLI session).

Maybe doing everything in PHP is not useful here? What do you think about moving some of the logic to start-server.ts where we know which request comes in first? We'd still need to keep it functional in the browser version but that's okay.

brandonpayton · 2025-12-19T15:55:34Z

Should we have a lower bound on the number of worker processes?

Yes! In single-worker mode, we have a default maximum of 5 php-wasm instances at a time.

Let's start our lower bound at 5 instances and see how it goes.

brandonpayton · 2025-12-19T16:25:02Z

Maybe doing everything in PHP is not useful here? What do you think about moving some of the logic to start-server.ts where we know which request comes in first? We'd still need to keep it functional in the browser version but that's okay.

The previous logic for clearing the autologin-has-happened cookie was in start-server.ts, but now there is not a single place it is used:

Every php-wasm worker process in the cluster calls startServer() to listen on the same port. It's how the cluster works, allowing each member to listen on the same port and coordinating which member gets the next request.

Maybe we can do something with a lock file. Will see :)

WIP: Make separate FileLockManagers for native locking

f32a8c6

brandonpayton self-assigned this Dec 8, 2025

brandonpayton added [Type] Bug An existing feature does not function as intended [Focus] Windows Support [Package][@php-wasm] Node [Package][@wp-playground] CLI labels Dec 8, 2025

brandonpayton added 6 commits December 8, 2025 20:33

Rename releaseLocksForProcessFd to releaseLocksOnFdClose

f4b7bd1

Restore tests that were accidentally left commented out in project.json

aac607e

Fix type errors in file-lock-manager-for-node tests

a6a2d25

Skip native file locking tests because they'll be moved to POSIX and …

872e8a3

…Windows FileLockManager tests

Fix posix manager class name

69302cf

Add tracking and cleanup for POSIX-native whole file locks

f4963d6

Implement release-on-close and release-on-exit for FileLockManagerFor…

6023b7d

…Posix

brandonpayton force-pushed the playground-cli/move-native-locking-into-workers branch from 60a6f0b to 6023b7d Compare December 9, 2025 18:10

brandonpayton added 8 commits December 9, 2025 23:40

Explain Path, Pid, and Fd types

d65feca

Use Path type instead of string in POSXI lock manager

8a02e76

Implement whole-file locking for Windows

2319613

Add POSIX lock manager TODO

d713beb

Cleanup relevant lock records on FD close for POSIX

47a4c1c

Implement whole-file lock cleanup on FD close and process exit

33d8d8f

Update POSIX lock manager to treat zero length ranges as covering ent…

222273d

…ire remaining range

Implement initial fcntl() for Windows along with cleanup on FD close …

00668c4

…and process exit

Address typechecking errors

9ab68d9

brandonpayton added 4 commits December 10, 2025 12:27

Update fs-ext-extra-prebuilt

7c22502

Use fcntlSync() with start/end params

677bca0

Declare tests for FileLockManager for Windows and POSIX

0d08953

Rename FileLockManagerForNode to FileLockManagerInMemory

f29e544

brandonpayton added 2 commits December 15, 2025 19:49

Fixes for FileLockManagerForWindows

c64c2bd

Adjust tests in response to Windows file locking findings

1f90328

brandonpayton added 3 commits December 16, 2025 21:21

Fix passing nativeInternalDirPath

853a401

Pick default FileLockManager based on platform

820668d

Get Windows file locking working with Playground CLI

d2c0d08

Move WP and SQLite download to initial worker

e0995d4

The purpose of this is to eliminate worker args that are not serializable as JSON so we can move from workers to separate processes which communicate via JSON-serialized messages rather than via transferred or cloned objects.

brandonpayton added 2 commits December 18, 2025 15:59

Move WP and SQLite download to initial worker

a27e096

The purpose of this is to eliminate worker args that are not serializable as JSON so we can move from workers to separate processes which communicate via JSON-serialized messages rather than via transferred or cloned objects.

WIP: Run Playground CLI workers as separate cluster processes

4fe2b75

Multiple fixes

9e38ab8

WIP run-cli test fixes

87f187f

[CLI] Move native file locking into workers #2997

Are you sure you want to change the base?

[CLI] Move native file locking into workers #2997

Conversation

brandonpayton commented Dec 8, 2025

Motivation for the change, related issues

Implementation details

Testing Instructions (or ideally a Blueprint)

Uh oh!

brandonpayton commented Dec 9, 2025

Uh oh!

brandonpayton commented Dec 10, 2025

Uh oh!

adamziel commented Dec 10, 2025

Uh oh!

brandonpayton commented Dec 16, 2025

Uh oh!

brandonpayton commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adamziel commented Dec 17, 2025

Uh oh!

adamziel commented Dec 17, 2025

Uh oh!

brandonpayton commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brandonpayton commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adamziel commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brandonpayton commented Dec 18, 2025

Uh oh!

brandonpayton commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brandonpayton commented Dec 18, 2025

Uh oh!

brandonpayton commented Dec 18, 2025

Uh oh!

brandonpayton commented Dec 18, 2025

Uh oh!

brandonpayton commented Dec 19, 2025

Uh oh!

brandonpayton commented Dec 19, 2025

Uh oh!

brandonpayton commented Dec 19, 2025

Uh oh!

brandonpayton commented Dec 19, 2025

Uh oh!

brandonpayton commented Dec 19, 2025

Uh oh!

adamziel commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brandonpayton commented Dec 19, 2025

Uh oh!

brandonpayton commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

brandonpayton commented Dec 17, 2025 •

edited

Loading

brandonpayton commented Dec 18, 2025 •

edited

Loading

brandonpayton commented Dec 18, 2025 •

edited

Loading

adamziel commented Dec 18, 2025 •

edited

Loading

brandonpayton commented Dec 18, 2025 •

edited

Loading

adamziel commented Dec 19, 2025 •

edited

Loading