For past 2 weeks (first observed with repo at CL45902105), we are seeing the following build error with UE5Main.
...
Waiting for 'git status' command to complete
Unhandled exception: NullReferenceException: Object reference not set to an instance of an object.
at System.Diagnostics.Process.Close()
at System.Diagnostics.Process.Dispose(Boolean disposing)
at UnrealBuildTool.GitSourceFileWorkingSet.WaitForBackgroundProcess() in E:\UE5\Engine\Source\Programs\UnrealBuildTool\System\SourceFileWorkingSet.cs:line 190
at UnrealBuildTool.GitSourceFileWorkingSet.Contains(FileItem File) in E:\UE5\Engine\Source\Programs\UnrealBuildTool\System\SourceFileWorkingSet.cs:line 218
... (snip) ...
at UnrealBuildTool.BuildMode.CreateMakefiles(BuildConfiguration buildConfiguration, List`1 descriptors, ISourceFileWorkingSet workingSet, ILogger logger, Boolean runInitScripts, Boolean skipPreBuildTargets, List`1 outTasks, CppDependencyCache cppDependencyCache) in E:\UE5\Engine\Source\Programs\UnrealBuildTool\Modes\BuildMode.cs:line 1198
at UnrealBuildTool.BuildMode.ExecuteAsync(CommandLineArguments Arguments, ILogger Logger) in E:\UE5\Engine\Source\Programs\UnrealBuildTool\Modes\BuildMode.cs:line 259
at UnrealBuildTool.UnrealBuildTool.Main(String[] ArgumentsArray) in E:\UE5\Engine\Source\Programs\UnrealBuildTool\UnrealBuildTool.cs:line 692
Please note that we are using git overlaid on P4-acquired assets and codebase, hence being detected as git environment. This error is more likely to be reproduced using dirty repository, presumably due to longer time it takes to complete “git status” command. The error may happen building Editor or uproject, via VS and UnrealEditor or UAT.
A quick look at the UBT code seems to hint, that the newly implemented parallel makefile generation (first at CL45805775; there were multiple back-and-force with CL45883996+CL45883954+CL45916324++CL45953575+CL45967448 but the crux of the problem remained) has significant oversights for GitSourceFileWorkingSet environment.
UnrealBuldTool.BuildMode.ExecuteAsync() at CL45902105 has the following code:
// Create the working set provider per group.
using (ISourceFileWorkingSet WorkingSet = SourceFileWorkingSet.Create(Unreal.RootDirectory, ProjectDirs, Logger))
{
List<(TargetDescriptor, TargetMakefile)> Targets = CreateMakefiles(BuildConfiguration, TargetDescriptors, WorkingSet, Logger, true, true, tasks, cppDependencyCache);
await BuildAsync(Targets, BuildConfiguration, Options, WriteOutdatedActionsFile, cppDependencyCache, Logger, ActionTypeFilter, executor);
}
Descendant functions from CreateMakefiles() may call the same GitSourceFileWorkingSet’s GitSourceFileWorkingSet.Contains() function. That function invokes GitSourceFileWorkingSet.WaitForBackgroundProcess(), which has the code to not only wait for the process’ completion with Process.WaitForExit(), but also Process.Dispose(). We confirmed that the errors like above results from multiple entries to Process.Dispose() simultaneously.
Primitive check using nullable .? has been added recently at CL45798257, but this is powerless with race condition that we are facing. If the current implementation are to be preserved, protection with a lock is likely recommended. Mentioned code in SourceFileWorkingSet.cs had existed and functioned okay for many years but was never reentrant, so things broke down when async usage was added. It may be worth noting, that it is not obvious why GitSourceFileWorkingSet.WaitForBackgroundProcess(), called by seemingly harmless GitSourceFileWorkingSet.Contains(), needs to Process.Dispose() immediately. It may be safer to simply let the GitSourceFileWorkingSet.Dispose() function handle the cleanup, which should be called from the “using” block, but this may not apply to other usages that we have not taken a look at.
Additionaly, we were taken by surprise that this parallelization behavior cannot be fully controlled by the newly added BuildConfiguration.bParallelMakefileGeneration. Even with bParallelMakefileGeneration = false, there seems to be code path which runs GitSourceFileWorkingSet processing in parallel, and end up with the same error with identical stack trace.
Hoping this issue can be addressed soon.
Regards,