UBA no longer working with Horde since 5.6 Preview

Hi Epic,

I have the following setup :

* Horde Server - Compiled using /Engine/Build/BatchFiles/RunUAT.bat BuildGraph -Script=“Engine/Source/Programs/Horde/BuildHorde.xml” -Target=“Build Bundled Docker Image”

* Horde Agents connected using Horde Auth Type

* Fresh MongoDB

----------------------------

Agents run jobs using “-UBA” for Incremental builds to leverage agents as compute.

This used to work on 5.5 however since upgrade I am noticing the following :

`Using Unreal Build Accelerator executor to run 1200 action(s)
Horde URL: https://mywebsite.ddns.net:13343/, Pool: Win-UE5, Cluster: (none), Condition: (none), Connection: (none), HordeEncryption: (none)
Storage capacity 40Gb
---- Starting trace: 250531_182002 ----
UbaServer - Listening on 0.0.0.0:1345
------ Building 1200 action(s) started ------
Horde server: 5.6.0-0, agent: 5.6.0-0
Created tool bundle with locator 94e5ed35a49943a184f8c0bb3ad78d6f_1#pkt=0,1537&exp=1
Horde cluster resolved as ‘default’
POST https://mywebsite.ddns.net:13343/api/v2/compute/default failed (null). Delaying for 1000ms (attempt #1).
POST https://mywebsite.ddns.net:13343/api/v2/compute/default failed (null). Delaying for 1000ms (attempt #1).
POST https://mywebsite.ddns.net:13343/api/v2/compute/default failed (null). Delaying for 1000ms (attempt #1).

^ Endlessly stall with this … jobs never finish.`Is this a known issue?, if not can I get some recommendations to troubleshoot this?

FYI : [How to add “AddComputeTasks” permission for a specified [Content removed] -> Verified this and it works correctly now when one of my other users try to build it connects to Horde obtains token and leverages available agents.

However, I am seeing issue on Horde now when its running jobs. Is there any additional setup I need to do to allow Horde Server scheduled jobs auto authenticate and get permissions to AddComputeTasks? :

UbaServer - Listening on 0.0.0.0:1345 ------ Building 5480 action(s) started ------ Horde server: 5.6.0-0, agent: 5.6.0-0 Created tool bundle with locator 492c1ca2628d4f2983f90252e54b4a52_1#pkt=0,1536&exp=1 Horde cluster resolved as 'default' Unable to get worker: EpicGames.Horde.Compute.ComputeClientException: User does not have AddComputeTasks permission for cluster default (HTTP status Forbidden) at EpicGames.Horde.Compute.Clients.ServerComputeClient.ConnectAsync(Nullable1 clusterId, Requirements requirements, String requestId, ConnectionMetadataRequest connection, ILogger workerLogger, CancellationToken cancellationToken)+MoveNext() in D:\HAS\AG-Inc\Sync\Engine\Source\Programs\Shared\EpicGames.Horde\Compute\Clients\ServerComputeClient.cs:line 321
at EpicGames.Horde.Compute.Clients.ServerComputeClient.ConnectAsync(Nullable1 clusterId, Requirements requirements, String requestId, ConnectionMetadataRequest connection, ILogger workerLogger, CancellationToken cancellationToken)+System.Threading.Tasks.Sources.IValueTaskSource<System.Boolean>.GetResult() at EpicGames.Horde.Compute.Clients.ServerComputeClient.TryAssignWorkerAsync(Nullable1 clusterId, Requirements requirements, String requestId, ConnectionMetadataRequest connection, ILogger logger, CancellationToken cancellationToken) in D:\HAS\AG-Inc\Sync\Engine\Source\Programs\Shared\EpicGames.Horde\Compute\Clients\ServerComputeClient.cs:line 249
at UnrealBuildTool.UBAHordeSession.AddWorkerAsync(Requirements requirements, UnrealBuildAcceleratorHordeConfig hordeConfig, CancellationToken cancellationToken, Int32 activeCores) in D:\HAS\AG-Inc\Sync\Engine\Source\Programs\UnrealBuildTool\Executors\UnrealBuildAccelerator\UBAAgentCoordinatorHorde.cs:line 283
at UnrealBuildTool.UBAHordeSession.AddWorkerAsync(Requirements requirements, UnrealBuildAcceleratorHordeConfig hordeConfig, CancellationToken cancellationToken, Int32 activeCores) in D:\HAS\AG-Inc\Sync\Engine\Source\Programs\UnrealBuildTool\Executors\UnrealBuildAccelerator\UBAAgentCoordinatorHorde.cs:line 376`

Thank you.

Kind Regards,

Abhishek Sagi

For the first issue, I am noticing an odd behavior even when compiling code locally on my machines as well which seems to hint at what could be happening when agents are running jobs:

Behavior observed : Whenever I trigger a compile locally it stalls at this step : Horde cluster resolved as ‘default’

There seems to be a preference to use Horde agents primarily for compilation instead of local machine. As a result it endlessly waits for agents to free up to schedule compute job.

I do not have ForceCompileRemoteOnly flag set to true. So it seems odd that this is happening.

Hey there Abhishek,

I just want to confirm that your first issue is still a problem - wasn’t sure if the FYI comment was that it had since been fixed. It does look like an auth issue, but it’s unclear and does seem a tad odd - I’d expect local actions to also be kicking off but I’d need to double check the code to see what’s going on. We haven’t heard of/seen any such hangs like that.

Regarding your second issue, there is NO extra steps that should be required here to get a build farm agent authenticated and working properly. Fundamentally, this works because the Horde Server will mint a new token, and inject it into the agent to be used. The following two UDNS cover this in great detail (I’m synthesizing some new documentation around debugging these types of auth issues).

  • [Content removed]
  • [Content removed]
  • [Content removed]
    • This one is pretty relevant

The key details being:

Potential follow-up is: “who creates it (the token)” - the server, with the code here. This should be trying to use the ACLServiceto mint a new token.

Do you happen to have you auth configuration handy (is it OIDC, or Horde auth)? Also, if you temporarily disable auth, do things work as expected? Again, to the above UE_HORDE_TOKEN injection path, it should be a newly minted token for the user and it should be passed through to the agent.

Julian

Hi Julian,

Sorry for confusing with multiple issues. Lets focus on getting compilation working with UBA locally when Horde is unavailable. But both issues remain ( auth or compute )

I have a pretty easy repro. If Horde Server is offline and when I trigger a compile All I see is :

  1. UbaServer - Listening on 0.0.0.0:1345
  2. ------ Building 5480 action(s) started ------
  3. Horde server: 5.6.0-0, agent: 5.6.0-0
  4. Created tool bundle with locator 492c1ca2628d4f2983f90252e54b4a52_1#pkt=0,1536&exp=1
  5. Horde cluster resolved as ‘default’
  6. GET https://mywebsite.ddns.net:13343/api/v1/server/auth failed ((null)). Delaying for 1000ms (attempt #1).

Previously the behavior I used to observe is local machine would start compiling units whilst looking for remote agents or establishing authentication with Horde.

Now it seems like behavior for : Auth or Request for compute is completely gated. Preventing local compilation from running.

This seems like quite a rigid approach as connectivity and agent availability cannot be guaranteed 100% due to myriad of reasons. So having a fallback without needing manual edit to config to disable UbaController would be preferable

Thank you.

Abhishek Sagi

Hey Abhishek,

Yeah makes sense! I’ll dig locally to get this working.

Julian

Hey Abhishek,

Just a quick note here. I cannot reproduce this on ue5-main, or 5.6-dev.

For posterity:

I’ve been testing with a local horde server (that I’ve turned offline && diverged to throw intermittent errors for clusters), and the local actions still indeed go through. To confirm your repro:

  • BuildConfiguartion.xml is updated to point to a horde server for UBA
  • Halt the Horde server service
    • I’ve even tried diverging a local horde server to throw an exception on the compute/{ClusterID} endpoint to emulate a server issue
  • LOCAL machine initiates a UBT invocation (dotnet UnrealBuildToold.dll UnrealEditor Win64 Development)
  • Note hang
    • Can you provide your callstack of UBT if you invoke this from visual studio or the like? ParallelStacks could also be useful here to help track down where the hang is coming from.
    • What’s going on here is that we are passing all of the executing actions into the C++ side via

I’ll be syncing 5.6 next. Just reviewing the code, the AutomaticRunner should be kicked off once the queue starts (with the remote actions being run separately) within the UBAExecutor.cs.

Can you provide your buildconfiguration.xml for Horde? Just want to make sure I’m not assuming anything here.

Let me know if I’ve misunderstood anything.

Julian

Hey there Abhishek,

Yes that’s correct. We fixed a long-existing bug where project scope build configurations were never picked up before. It’s good to hear that you addressed this.

I’ll investigate the -UBA path on my 5.6 build because they should still be executing local actions while attempting to source helper agents.

Julian

Hi Julian,

I did some troubleshooting, and it turns out I had a copy of BuildConfiguration.xml within /Project/Saved/UnrealBuildTool/.. which never worked as it used to only pick up BuildConfiguration.xml from /Engine/Saved/UnrealBuildTool/..

Since the new recommended way is to specify these settings within config I deleted both BuildConfigurations and it is now properly executing locally even when remote is not available for auth/compute.

This however seems to be an issue for build graph nodes that use -UBA. I have been working around this by disabling UBA inside buildgraph.

Thank you.

Kind Regards,

Abhishek