When a Horde agent is assigned a build task and then reaches out to another Horde agent for UBA, sometimes I get errors about receiving files from the UBA worker. Example:
UbaStorageServer - Expecting to be able to decompress to 8387636 bytes but got 0 (2d0f1f58aaa9b37deed12ebc61f7f11a83e3d701 -> C:\HordeAgent\Sandbox\Demo-Inc-Full\Sync\Engine\Intermediate\Build\Win64\x64\CrashReportClient\Shipping\CoreUObject\Module.CoreUObject.6.cpp.obj) UbaSessionServer - Failed to copy cas from 2d0f1f58aaa9b37deed12ebc61f7f11a83e3d701 to C:\HordeAgent\Sandbox\Demo-Inc-Full\Sync\Engine\Intermediate\Build\Win64\x64\CrashReportClient\Shipping\CoreUObject\Module.CoreUObject.6.cpp.obj (Module.CoreUObject.6.cpp (Compile [x64]))
That’s what I see in the Horde step log.
If I go to the agent running the task, I can get a log with a little more information:
UbaStorageServer - Expecting to be able to decompress to 8387636 bytes but got 0 (2d0f1f58aaa9b37deed12ebc61f7f11a83e3d701 -> C:\HordeAgent\Sandbox\Demo-Inc-Full\Sync\Engine\Intermediate\Build\Win64\x64\CrashReportClient\Shipping\CoreUObject\Module.CoreUObject.6.cpp.obj) UbaSessionServer - Failed to copy cas from 2d0f1f58aaa9b37deed12ebc61f7f11a83e3d701 to C:\HordeAgent\Sandbox\Demo-Inc-Full\Sync\Engine\Intermediate\Build\Win64\x64\CrashReportClient\Shipping\CoreUObject\Module.CoreUObject.6.cpp.obj (Module.CoreUObject.6.cpp (Compile [x64])) UbaSessionServer - Client REDACTED11 returned process 1399 to queue (Failed to send output files to host) ** For CrashReportClientEditor-Win64-Shipping ** [1075/1171] (Wall: 45.70s CPU: 34.61s) Compile [x64] Module.CoreUObject.13.cpp [RemoteExecutor: REDACTED11] [Worker0] UbaSessionClient - Server failed to receive file E:\HordeAgent\Sandbox\Saved\Uba\sessions\250519_115009\output\850a643fa5f119eedec779c097a17508 (C:\HordeAgent\Sandbox\Demo-Inc-Full\Sync\Engine\Intermediate\Build\Win64\x64\CrashReportClient\Shipping\CoreUObject\Module.CoreUObject.6.cpp.obj) [Worker0] UbaSessionClient - Failed to send output files to host
I confirmed that Worker0 is, in fact, REDACTED11
I also looked for logs on REDACTED11, but found nothing useful in
E:\HordeAgent\Sandbox\Saved\Uba\sessions\250519_115009\log
The cas ID was not present anywhere in E:\HordeAgent\Sandbox\Saved\Uba
The obj file does eventually show up at C:\HordeAgent\Sandbox\Demo-Inc-Full\Sync\Engine\Intermediate\Build\Win64\x64\CrashReportClient\Shipping\CoreUObject\Module.CoreUObject.6.cpp.obj
but I think that’s because the local machine compiles it right at the end:
[1169/1171] (Wall: 14.53s CPU: 14.50s) Compile [x64] Module.CoreUObject.6.cpp
The error can appear with many different cpp files across many build steps (Linux server, tools, editor, etc…)
It seems like the task does finish by virtue of retries, but it still gets marked as an error and fails the whole graph.
My setup:
- Local dev machine running the Horde solution under the debugger
- REDACTED08 as the only machine in the pool that can be assigned tasks
- REDACTED11, REDACTED12, and REDACTED13 are pure compute nodes with the Horde Agent installed, but not configured to be in a pool that will be assigned Horde tasks
Two questions:
- Any hints what could be causing this? This seems like it would required a pretty deep and challenging dive without any direction.
- Why is this causing the whole graph to fail? If we retry the compute task, should it really mark the build step as having errored? Is there a way to have this error not be fatal, since the step does actually complete?