We found some intermittent failures in UBT where it would appear to just stop suddenly after completing an action and end with exit code 6.
We tracked it down to what seems to be a race condition in ImmediateActionQueue.TryStartOneAction(). Between the first part of the method where the action is attempted to be started, and the end where it checks for a stall, the number of active actions may decrease if an action completes. This can falsely indicate a stall, if we are waiting on actions to unblock the next action, but they complete during execution of this method and the number of active actions ends up at 0 by the time we do the stall check.
We have a repro case here with some extra debugging to make the sequence of events clearer. The actual behavioral changes are to add a sleep to TryStartOneAction(), and to force using the ParallelExecutor as it creates a single ImmediateActionQueue runner of Automated type. https://github.com/EpicGames/UnrealEngine/pull/13918
The race condition itself can occur even with other executors, but in cases like UGA where there’s a manual runner as well, the stall check will always pass on the manual runner, so it doesnt really matter. For us, in our FASTBuild executor we end up with some actions we pass off to a ParallelExecutor, which is why we saw this to begin with.
Ideally for us, we could get a fix in for the bug itself (even if it doesn’t manifest in any default Engine usage), so we can mitigate it on our end without incurring more divergence, until we’re ready to fully swap over to UBA. I have a proposed fix here, which is to just pull the stall check up into action lock scope https://github.com/EpicGames/UnrealEngine/pull/13917