As of 5.7, UBA dispatched shadercompiler.exe on remote machines fails to load dxcompiler.dll

The smallest change I can make to have things function again is to revert to the dxcompiler.dll from 5.7-preview. That single DLL change makes things function.

Further investigation performed:

  • Manually loaded the dll with a custom program on the remote agent machine - this works fine
  • Compared dependents of the dll, they are identical between 5.7-preview (the working version) and 5.7 (.0).
  • Intercepted syscalls using procmon, none of the LoadModule calls fail.

Because of all of this, I’m currently suspecting that the detours version of LoadLibrary is what is causing the problem. (Invoked via FDllHandle:FDllHandle in DXCWrapper.cpp, via FWindowPlatformProcess::LoadLibraryWithSearchPaths.)

Having looked at https://dev.epicgames.com/community/learning/knowledge\-base/jB32/unreal\-engine\-practical\-debugging\-tips\-for\-unrealbuildaccelerator and the UBA code some time has been spent on trying to debug this. Unfortunately that page is focused on UBT/code compiles, and less on shader compiles.

It is obvious there is logging code in the UbaDetours module, but so far I’ve not been successful in getting this to actually work (compiled this and other modules in debug mode to ensure logging is compiled in, enabled various cvars for log relaying etc, but no luck. There is one but of (to me) dubious code in FUbaJobProcessor::RunTaskWithUba where the “LogFile” gets set to the input file, but changing this to a different filename seems to not help creating a log file either.

Not sure if anyone else has seen this? Some assistance in helping getting detours logging working properly or insights in other ways of debugging this would be wonderful. At this point what I’m really after is the various output of the remote agent from DEBUG_LOG_RETOURED to get a better sense of what’s going on internally there.

I do see some errors like this in our log files, but barring more detailed logging I’m not sure if this is a red herring:

LogUbaHordeAgent: Response [ExecuteOutput]: UbaSessionClient - DetourCreateProcessWithDllEx failed: "D:\P4Workspace\Engine\Binaries\Win64\ShaderCompileWorker.exe"  <removed shader params>  -Multiprocess  (Working dir: D:\Horde\Work\Saved\Uba\empty\). Exit code: 8 - Not enough memory resources are available to process this command.

[Attachment Removed]

Steps to Reproduce
For us, on 5.7 (final, not preview), try to load EngineTest while having configured Horde (5.6 deploy) for UBA. When it starts dispatching remote shader compilation, it asserts due to failing to load dxcompiler.dll.

Exception:
Assertion failed: Handle [File:Z:\UEVFS\Root\Engine\Source\Developer\ShaderCompilerCommon\Private\DXCWrapper.cpp] [Line: 80] 
Failed to load module: ../../../Engine/Binaries/ThirdParty/ShaderConductor/Win64/dxcompiler.dll


Callstack:
0x00007ffd1ca0bd08 ShaderCompileWorker-Core.dll!UnknownFunction []
0x00007ffd23a49b38 ShaderCompileWorker-ShaderCompilerCommon.dll!UnknownFunction []
0x00007ffd23a49c7f ShaderCompileWorker-ShaderCompilerCommon.dll!UnknownFunction []
0x00007ffd2411af01 ShaderCompileWorker-ShaderFormatD3D.dll!UnknownFunction []
0x00007ffd1cb27a40 ShaderCompileWorker-Core.dll!UnknownFunction []
0x00007ffd1cb26d43 ShaderCompileWorker-Core.dll!UnknownFunction []
0x00007ff6a4663c7c ShaderCompileWorker.exe!UnknownFunction []
0x00007ff6a46644fb ShaderCompileWorker.exe!UnknownFunction []
0x00007ff6a4664b4d ShaderCompileWorker.exe!UnknownFunction []
0x00007ff6a4671bf8 ShaderCompileWorker.exe!UnknownFunction []
0x00007ff6a46730ac ShaderCompileWorker.exe!UnknownFunction []
0x00007ffd3d2de8d7 KERNEL32.DLL!UnknownFunction []
0x00007ffd3eb48d9c ntdll.dll!UnknownFunction []



[Attachment Removed]

Hello Arnout,

I presume you already found CVar `r.UbaController.LogVerbosity`, but if not, please set it to 2 to forward all UBA logs to UE_LOG. Then you should also run with a debug build of UbaAgent as this enables detoured logging. From what I can see in the code, the log file should have the same path as the input file (indeed not obvious why) plus `.log` extension and optionally some other suffixes for child processes.

Kind regards

Laura

[Attachment Removed]

Hi, we found a bug in uba where it is not properly copying files from system32 folder if they exist locally. Uba has a “known system files” list containing files that should not be copied.. but the logic had a bug which made it not copy other files in system32 (vcruntime is not a known system file).. I’m doing a fix in the code right now which will need new binaries.

You can do the fix locally if you want to test it out. Open UbaSessionClient.cpp

Search for

StringBuffer<> localSystemModule;
				localSystemModule.Append(m_systemPath).Append(moduleFile.data + serverSystemPathLen);
				if (FileExists(m_logger, localSystemModule.data) && !localSystemModule.EndsWith(TCV(".exe")))
					continue;
				moduleFile.Clear().Append(localSystemModule);

and just comment out the entire thing. If you are working on mac or linux you should instead add #if !PLATFORM_WINDOWS around that code.

Then compile UbaAgent target (“UbaAgent win64 shipping” if you work on windows)

Tell me how it goes

[Attachment Removed]

Yeah I already set that to 2, hardcoded it to max actually to rule out any config issues. No log files from what I can tell.

[Attachment Removed]

Hi there,

in case that helps finding the culprit code, this also occurs when compiling shaders distributed with SN-DBS without Horde (with the exact same assertion & log).

[Attachment Removed]

We don’t have an SN-DBS setup anymore to compare this to, unfortunately.

[Attachment Removed]

As a workaround, revert dxcompiler.dll & dxcompiler.pdb in Engine/Binaries/ThirdParty/ShaderConductor/Win64 to the one found in 5.6 release.

I did this and had 0 error since.

Regards,

[Attachment Removed]

Yeah that’s our workaround, but I’d like to find the root cause of the issue to actually use the latest (and I suspect Epic needs to fix this for others too).

[Attachment Removed]

FWIW: I just recalled that our dxcompiler.dll as well as ShaderConductor.dll are now compiled with Clang by default instead of MSVC. Are you able to rebuild dxcompiler.dll from UE5.7 source? If so, you can verify whether this is causing the issue by running this command:

Engine/Source/ThirdParty/ShaderConductor/Build_ShaderConductor_Win64.bat -msvcYou’ll need CMake 3.17+ (not compatible with CMake 4 at the moment unfortunately), Python 3, and Visual Studio Active Template Library (ATL).

For A/B comparison, run this command without the `-msvc` argument to compile with Clang compiler.

[Attachment Removed]

Tried compiling with -msvc and it doesn’t seem to make a difference. Didn’t really expect there to be any because I already verified all dependencies of the dll and they all checked out.

FWIW I’ve added debug code to LoadLibraryWithSearchPaths that reads the dll’s imports and verifies they are all valid on the target machine. This doesn’t result in any missing imports.

[Attachment Removed]

[mention removed]​ can you confirm you can have detour logging redirected and working for yourself? Just trying to rule out if it is me doing something wrong or if it doesn’t work on your side either.

[Attachment Removed]

I built UbaAgent, UbaCli, UbaDetours, and UbaHost modules in Debug mode (maybe only UbaAgent is necessary, but I haven’t double checked). I also enabled CVar `r.UbaController.ProcessLogEnabled=true` and now I can see the UBA agent logs show up under %LOCALAPPDATA%\Temp\UbaControllerStorageDir\0\sessions\<SESSION>\log. These log files have (unintuitively) the same filename as the agent input files, e.g. 0.uba.j-4.in (“j-4” meaning there were 4 shader compiler jobs encoded). This file looks something like this:

Detached: true
RPC_MESSAGE Init
ProcessId: 1
CmdLine: "<REDACTED>\Engine\Binaries\Win64\ShaderCompileWorker.exe" "C:/Users/<USER>/AppData/Local/Temp/UbaControllerWorkingDir/<GUID-OR-HASH>/0/" 45780 0 "0.uba.j-4.in" "0.uba.j-4.out"  -Multiprocess 
WorkingDir: <REDACTED>\Engine\Binaries\Win64\
ExeDir: <REDACTED>\Engine\Binaries\Win64\
ExeDir (actual): <REDACTED>\Engine\Binaries\Win64\
SystemTemp: C:\Users\<USER>\AppData\Local\Temp\UbaControllerStorageDir\0\sessions\<SESSION>\temp
Rules: 27 (171384111)
 
RPC_MESSAGE GetSharedMemory
T      GetEnvironmentVariableA OANOCACHE -> NOTFOUND
T      GetEnvironmentVariableA OAPERUSERTLIBREG -> NOTFOUND
T      GetEnvironmentVariableA OACACHEPARAMS -> NOTFOUND
T      RegOpenKeyExA 0 (SOFTWARE\Microsoft\OLEAUT) -> Error
D      GetModuleFileNameW 0  260 (<REDACTED>\Engine\Binaries\Win64\ShaderCompileWorker.exe) -> 62
T      NtQueryInformationProcess (class 37) 18446744073709551615 -> Success
T      CreateMutexEx Local\SM0:24200:304:WilStaging_02
T      NtClose 516 (UNKNOWN) -> Success
T      NtClose 512 (UNKNOWN) -> Success
T      NtClose 508 (UNKNOWN) -> Success
...

I hope that works for you. Please note that we’ll be on winter break and I’m OOO starting tomorrow.

Best regards

Laura

[Attachment Removed]

Thanks Laura.

I tracked down the issue and I do think it has to do with the newly built dxcompiler.dll. The newer C++ runtime dlls provide exportsthe old ones do not, and it looks like the newer dxcompiler.dll makes use of these.

The newly compiled dxcompiler.dll in 5.7 requires VC_redist.x64.exe of 17.8 or newer. This is not a missing DLL, instead, something goes wrong init time with older runtimes.

UE5 (5.7) itself (shadercompiler.exe, etc) and dxcompiler.dll from 5.6 all happily work with the latest redist I can download for VS2015+ (v17.2).

For more detailed version numbers, 14.36.32532 (17.6) does not work for me but 14.38.33142 (17.8) does work.

Does this also mean that UBA doesn’t actually detour the c++ runtimes? Meaning that the local executing environment isn’t entirely mirrored?

[Attachment Removed]

FWIW: Someone else I had a discussion with had similar issues with UBA and pin pointed it to an outdated msvcp140.dll file on the remote machines.

[Attachment Removed]

Hey, yes - as per my last post of the 18th of December that’s what we ran into as well, including some of the surprising findings how UBA handles this. I assume for performance reasons UBA doesn’t detour the VC runtimes, but maybe something can be done to make error handling around this more robust? I guess Epic manually update the VC runtimes on their UBA build farm as part of software management processes on the agents (e.g., ansible or the like).

[Attachment Removed]