Static analysis intermittently fails when UBA cache is in use

Running static analysis in Horde intermittently fails when UBA Cache is in use due to (what I assume) failure to materialize some PCH object files:

---- Starting trace: 251015_102924 ----
UbaCache - Connected to unreal-horde.skyboxlabs.com:1347 (0.42s)
UbaSessionServer - Disable remote execution (remote sessions will finish current processes)
------ Building 15 action(s) started ------
[1/15] (Wall: 0.00s) Resource Default.rc2 [Cache]
[2/15] (Wall: 0.00s) Analyze [x64] SharedPCH.UnrealEd.Project.ValApi.ValExpApi.Cpp20.cpp [Cache]
... snip ...
[10/15] (Wall: 0.00s) Analyze [x64] Module.LyraGame.2.cpp [Cache]
[11/15] (Wall: 0.00s) Analyze [x64] Module.LyraGame.3.cpp [Cache]
[12/15] (Wall: 14.42s CPU: 0.06s) Analyze [x64] Module.Sparkle.cpp
D:\h\sparkle\Sync\Projects\Sparkle\Intermediate\Build\Win64\x64\UnrealEditorSA\Development\Sparkle\Module.Sparkle.cpp: fatal error C1083: Cannot open precompiled header file: '../Intermediate/Build/Win64/x64/UnrealEditorSA/Development/UnrealEd/SharedPCH.UnrealEd.Project.ValApi.ValExpApi.Cpp20.h.pch': Permission denied
[13/15] (Wall: 14.43s CPU: 0.39s) Analyze [x64] Module.SblMarsGameFeatureActions.cpp
D:\h\sparkle\Sync\Projects\Sparkle\Plugins\SblMarsGameFeatureActions\Intermediate\Build\Win64\x64\UnrealEditorSA\Development\SblMarsGameFeatureActions\Module.SblMarsGameFeatureActions.cpp: fatal error C1083: Cannot open precompiled header file: '../Intermediate/Build/Win64/x64/UnrealEditorSA/Development/UnrealEd/SharedPCH.UnrealEd.Project.ValApi.ValExpApi.Cpp20.h.pch': Permission denied
[14/15] (Wall: 14.43s CPU: 0.06s) Analyze [x64] Module.SblMarsModule.cpp
D:\h\sparkle\Sync\Projects\Sparkle\Plugins\SBLPlugins\SBLCore\SblMars\Intermediate\Build\Win64\x64\UnrealEditorSA\Development\SblMarsModule\Module.SblMarsModule.cpp: fatal error C1083: Cannot open precompiled header file: '../Intermediate/Build/Win64/x64/UnrealEditorSA/Development/UnrealEd/SharedPCH.UnrealEd.Project.ValApi.ValExpApi.Cpp20.h.pch': Permission denied
[15/15] (Wall: 14.43s CPU: 0.06s) Analyze [x64] Module.SblMarsExtensions.cpp
D:\h\sparkle\Sync\Projects\Sparkle\Plugins\SBLPlugins\SBLCore\SblMarsExtensions\Intermediate\Build\Win64\x64\UnrealEditorSA\Development\SblMarsExtensions\Module.SblMarsExtensions.cpp: fatal error C1083: Cannot open precompiled header file: '../Intermediate/Build/Win64/x64/UnrealEditorSA/Development/UnrealEd/SharedPCH.UnrealEd.Project.ValApi.ValExpApi.Cpp20.h.pch': Permission denied


It shows up as “Permission denied” errors in the log, but inspecting the workspace reveals that the problematic files simply do not exist.

We have tried various combinations of UbaCacheServer and other related UBA binaries from 5.6.0 release up to latest from //UE5/Release-5.7.

For now, the workaround is to modify UBT’s code to skip load cache providers when SA mode is detected. There is no way to affect BuildMachineCacheProviders setting from commandline or environment variable.

Hi Yang,

I have not seen this before. I need to take a look. this is with PVSStudio target?

ok.. I suspect it is VisualCpp or clang since you have pch files enabled. I just tested myself and it seems to work for me.

Question, are you using vfs?

I’ve tried reproing this with cache server on both clang, visualcpp and pvsstudio. both with and without vfs. Unfortunately I can’t find any repro

I see this step in your log above. you’re saying it does not download the actual pch file?

[2/15] (Wall: 0.00s) Analyze [x64] SharedPCH.UnrealEd.Project.ValApi.ValExpApi.Cpp20.cpp [Cache]

If there is a bug somewhere I would assume it would be in the UBT code.

In VCToolChain.cs there is some code looking like this

					if (Target.WindowsPlatform.Compiler.IsMSVC() && !CompileEnvironment.bUseHeaderUnitsForPch && Target.StaticAnalyzer == StaticAnalyzer.Default && !CompileEnvironment.bDisableStaticAnalysis)
					{
						CompileAction.AdditionalProducedItems.Add(FileItem.GetItemByFileReference(FileReference.Combine(OutputDir, SourceFile.Location.GetFileName() + PchExtension + "ast")));
					}

That is the code that says that this action produce this .pchast file.

This is also the information uba uses when knowing what to upload to the cache server… if it doesn’t add this file here, then the cache entry will not have the file and you will see a scenario where downloading the cached result of this action will miss this file.

(also, if this is the bug and you fix it, then you would need to wipe your cache server or touch some file included in the pch to make sure you get new entries)

We’re only using MSVC for static analysis at the moment. These are Windows agents with `-Analyzer=Default`.

It’s unclear if the PCH files are not being downloaded, or something causes them to be deleted or cleaned up unexpectedly. They are definitely being uploaded, and correctly, as the same build step usually succeeds. I don’t think I’ve ever seen it fail on a retry, even on the same machine.

I’ve attached partial Horde logs for a bad and good SA run on the same CL. These were done days apart.

It has the stink of a race condition, but I haven’t dug into it much further. I have also been unsuccessful in intentionally reproducing it.

hmm, that is very strange. I wonder if it is possible for the cache client to fail to write a file silently or something. I have never heard of a similar issue before and the epic farm is doing these things 1000s of times per day.

Unfortunately I think this is one of those things where we will need to add a bunch of custom logging to be able to narrow down where this could go wrong. Let’s break it down.

We see the “SharedPCH.UnrealEd.Project.ValApi.ValExpApi.Cpp20.cpp [Cache]” in the log. That means at least that the code went through the uba CacheClient. Do you think you can open the .uba file in visualizer and hover over the pch action and check the stats there.. it should show number of files downloaded and total bytes… a pch like this seems to contain a ton of files :slight_smile: In my case 6 of them. a .obj, a .pch, a .pchast, a .json and two .sarif. (those .pchast are huge!). in that failed build.. do you have any of the other files?

The function CacheClient::FetchFile in UbaCacheClient.cpp is the interesting one. I guess we will need to add logging in that one to see what is being requested. When looking at the code I realize we downgrade the logging (errors and warnings to info) because we don’t want a failed cache step to fail the build.. maybe there is something hidden in the logs that could reveal something. The partial logs, are those the actual log-file or copy-pasted from screen? The logs on disk (attached in horde) contains more information that the ones in “view log” in the horde interface

Here’s when I’m downloading one of those from the cache: [Image Removed]

I’ve been looking through the code to see if I can see any way where it could return true (successful cache hit) without actually succeeding but things look correct. So the things to narrow down are

  1. Does the cache entry in the cache server have all the files. Checking so .uba visualizer is showing 6 files is a good step.
  2. Check the attached log in horde for clues (not the one showing when pressing “view log”
  3. See if any of the other files from that cache entry exists on disk.

Attaching the most recent failing Trace.uba I have.

I’ve reproduced the headless build agents as best I can locally, but haven’t figured out what code changes trigger that particular pattern of analysis jobs.

Holy crap.. 5gb data for that cache entry.. I wonder if there are any virtual alloc reserve sizes that are too small.

Did you check on disk if any of the six files are there?

(unrelated, you should disable ntfs last access tracking on this machine and others to save yourself some performance.. you can see in the visualizer view that it is enabled)

I’ve been trying to repro your setup without success :-/

I think you will need to add log entries to the code so we can gather more information.

search for this in UbaCacheClient:

			DowngradedLogger logger(m_connected, m_logger.m_writer, TC("UbaCacheClientDownload"));
			bool writeCompressed = m_session.ShouldStoreIntermediateFilesCompressed() && g_globalRules.FileCanBeCompressed(path);
			if (!fetcher.RetrieveFile(logger, m_client, casKey, path.data, writeCompressed, nullptr, attributes))
				return logger.Debug(TC("Failed to download cache output %s for %s"), path.data, info.GetDescription()).ToFalse();

you can add

m_logger.Info(TC("DOWNLOADED FILE %s for %s"), path.data, info.GetDescription());and maybe some more info if you want… changed that logger.Debug to m_logger.Warning or something.

Haven’t had much slack to look into this deeper. I’ll likely revisit if it’s still occurring frequently after 5.7 officially releases.

I think this turned out to be a race with our anti-malware software afterall.

Some unrelated change made the issue appear frequently against regular compilation builds, which prompted us to tweak the anti-malware settings. I’ve not seen the issue crop up since.

ok, good to hear… uba actually has a couple retries on permission denied because of similar issues with what seems to be antivirus.. maybe I should try to find the code path to this one and verify so it can retry

And associated log.