We are currently experimenting with UnrealBuildAccelerator (UBA) after successfully deploying a Horde server and installing multiple Horde agents.
Distributing compilation with UBA works as expected. However, we noticed in Horde web dashboard that the local host, which initiated the compilation, still has its Status = Ready, instead of “Compute task” like the others. If you look at the screenshot attached, it’s the one in the 4th row.
This causes a problem because this host is already busy with a lot of local compilations queued up (using almost 100% of its CPU), but because of its “Ready” status in Horde, another host’s UBA can still distribute compute tasks over. What can we do to prevent this?
Horde has two mechanisms to prevent assigning extra work to the initiator machine itself:
A) When a compute resource (agent) is requested, it will not assign work to an agent matching requesting user’s IP
B) Agent can mark itself as busy if heavy workloads run locally on the machine (outside assigned work from Horde/UBA)
B) is more experimental as we haven’t rolled out workstation-based agents internally at Epic. The Horde agent tray app helps flag this, but this is also being reworked.
But A) should help you though.
Do you run your Horde server behind a reverse proxy like Nginx or similar?
Does the initiating machine (running UBT and a Horde agent) have multiple IPs?
Check what IP is reported as “ComputeIp” in the agent’s properties in Horde web UI
This is unfortunately still experimental, we haven’t tested this internally but it’s definitely on our roadmap.
If you run Horde from source (or the Horde agent possibly) and are willing to experiment or modify code yourself you may get something running.
The Unreal Toolbox app is key here, as it runs as the current desktop user identifying if the user is idle or what critical processes are running. It can signal to the agent if it should mark itself as busy or not, preventing any leases being assigned to it, like UBA tasks.
HordeAgentPlugin.cs and HordeAgentSettings.cs are two starting points.
Thanks for the response. Here are more information on our setup.
AWS Infrastructure
We’re using the `unreal/horde` Terraform module provided by Amazon Cloud Game Development Toolkit to run Horde in AWS, so there is a load balancer sitting in front of the Horde server ECS task.
Beside that, we also have a site-to-site VPN between the office LAN and AWS VPC. In short, LAN (192.168.x.x) and VPC (172.31.x.x) machines can communicate with each other directly via private IP addresses.
In Horde website, I can see all workstation agents showing their 192.168.x.x IPs in `ComputeIp`.
Here is a simple way to reproduce, with 3 workstation agents, let’s call them Agent-A, Agent-B and Agent-C. All of them have `bForceBuildAllRemote=false` and are in the office LAN (192.168.x.x).
Step 1
Agent-A starts a big game build to trigger distribution.
UBA logs in to Horde, and starts to send compute tasks to Agent-B and Agent-C (we can see all these in Visual Studio output)
Go to Horde website, confirm that Agent-B and Agent-C have Status=“Compute Task”, but Agent-A remains “Ready”. (At this point, Agent-A is also doing a lot of heavy compilation itself)
Step 2
Start another big build at Agent-B to force distribution. (Agent-B is also still working for remote task at this moment)
Notice that the build is being distributed to Agent-A, because it was “Ready”.
Refresh list of agents on Horde website, Agent-A now has Status = “Compute task” too
Double confirm on Agent-A’s Task Manager that multiple `cl.exe` processes have spawned to help with the remote task.
Like my original post mentioned, we don’t think Agent-A should take on any remote compute tasks because it itself is still in the process of a big compilation. I think the easiest “fix” will be to make Agent-A’s status also “Compute task” when it’s the initiator of a compilation, but I am unable to find the code to make these custom changes ourselves.
We would also be very interested in (B) even if it’s experimental. For example, when developers are playing the game, their machines should be marked as busy to avoid anything causing framerate to drop.