ParallelQueue.Drain() deadlock in EpicGames.Build/System/Parallel2.cs

## Description

We found an intermittent deadlock in `ParallelQueue.Drain()` (`Engine/Source/Programs/Shared/EpicGames.Build/System/Parallel2.cs`) that causes UnrealBuildTool to hang forever on macOS. The deadlock occurs during `Rules.PrefetchRulesFilesInternal()` when discovering `.Build.cs` and `.Target.cs` rule files.

The bug is a race condition in the producer/consumer coordination between worker threads and `Enqueue()` callbacks. When an `action()` callback (e.g., `FindAllRulesFilesRecursively`) calls `Enqueue()` to add subdirectories, there’s a window where:

1. Worker A finishes its action, calls `Interlocked.Decrement(ref _outstanding)` which returns 0

2. Worker B (on another thread, inside its own `action()`) calls `Enqueue()`, which calls `Interlocked.Increment(ref _outstanding)` back to 1

3. Worker A sets `_accepting = 0` (line 305)

4. Worker A reads `_outstanding` — sees 1, does NOT set `_done`, does NOT release workers

5. Worker B’s enqueued item is processed, `_outstanding` goes back to 0

6. But now `_accepting = 0`, and the done-signal path has already been skipped

7. All workers block on `_available.Wait()` forever — deadlock

The core issue is that the `Decrement` returning 0, the `_accepting = 0` assignment, and the `_outstanding == 0` re-check (lines 300-310) are not atomic with respect to concurrent `Enqueue()` calls from action callbacks.

## Workaround

Calling `queue.Drain(helperCount: 0)` in `Rules.PrefetchRulesFilesInternal()` forces single-threaded execution, eliminating the concurrent race. File enumeration is I/O-bound, so the performance impact is minimal.

## Proposed Fix

The race in `Drain()` lines 300-310 needs the `_outstanding` decrement and the done-check to be atomic with respect to `Enqueue()`. Options:

- Lock-based: wrap both `Enqueue`'s increment and the worker’s decrement+done-check in the same lock

- Restructure: use `CountdownEvent` or a different signaling mechanism that handles concurrent `AddCount`/`Signal` atomically

We attempted several lock-free and lock-based fixes but the race has multiple interleaving paths that are difficult to close without restructuring the `Drain()` method. We’d appreciate Epic’s take on the correct fix for `Parallel2.cs`.

## Environment

- macOS 15.6 (Sequoia), ARM64 (Mac M-series EC2 instances)

- .NET 8.0.412

- UE 5.7

- FASTBuild executor with ParallelExecutor for local actions

- Xcode 16.4 + Xcode 26 beta installed (Xcode 26 install increased repro rate, likely due to changed file enumeration timing from additional SDK directories)

## Related

- EPS 0D5QP00001XJH7Y0AX (ImmediateActionQueue race — different bug, same codebase area)

- Epic PR #13917 (ImmediateActionQueue fix — does not help this bug)

[Attachment Removed]

Steps to Reproduce
We wrote a standalone C# repro that simulates the `PrefetchRulesFilesInternal` pattern (recursive directory enumeration with `Enqueue` from action callbacks). The original `ParallelQueue` code deadlocks ~5-13% of the time with default `helperCount` on a 10-core machine.

```csharp

// Repro: simulates recursive Enqueue from action callbacks

using var q = new ParallelQueue();

var rng = new Random();

void Scan() {

Thread.SpinWait(rng.Next(10, 50000));

for (int j = 0; j < rng.Next(0, 3); j++)

try { q.Enqueue(Scan); } catch { break; }

}

for (int j = 0; j < 20; j++) q.Enqueue(Scan);

q.Drain(); // hangs ~5-13% of the time

```

[Attachment Removed]

Thank you for the detailed write-up and the repro project.

I’ve put together a reimplementation of ParallelQueue using CountdownEvents, as you suggested.

However, I was unable to hit a deadlock with my hardware using the original implementation/repro project.

Before sharing this with the engine team for further review, could you please confirm that the fix resolves the deadlocks on your hardware?

[Attachment Removed]