We are using AWS autoscaling to add extra Horde Agents to assist in remote compilation.
We are using the AwsAsg
FleetManagerType to scale in and out a dedicated AWS Pool, based on LeaseUtilization strategy.
Configuration snipppet:
"pools": [
{
"name": "Win-UE5-AWS",
"condition": "EC2 == 1",
"color": "Blue",
"workspaces": [],
"enableAutoscaling": true,
"sizeStrategies": [
{
"type": "LeaseUtilization",
"condition": "true",
"config": "{\"minAgents\": 0, \"numReserveAgents\": 0 }",
"extraAgentCount": 0
}
],
"fleetManagers": [{ "type": "AwsAsg", "config": "{\"name\":\"horde-agent-asg\"}" }]
}
]
This all works fine, except if we allow the SizeStrategy configuration to scale the ASG to 0 instances when there’s no work.
In this situation, the ASG is scaled to 0, but is never scaled out again.
The problem is comes from the TickLeaderAsync
function in FleetService.cs
, which on an interval checks to see if any scaling is needed. But instead of iterating over all Pools, it only checks Pools that have agents already, using GetPoolsWithAgentsAsync
As long as our Pool has 1 or more agents, things work fine, and the the FleetService will scale the ASG up and down as needed. But when we allow it to scale down to 0, the next time it checks for scaling requirements, it will skip the pool, since it has no agents and as result, it will never scale the pool again unless we manually add an agent to it.
Is this intended functionality? It seems to be unchanged code for ~3 years.
I would really like to be able to scale down to 0 EC2 instances when there is no activity, but the code seems to explicitly not support this.
I understand that it might cause unnecessary work to evaluate all pools on every tick, but then I would prefer another property on the Pool that indicates if it should be considered for scaling - whether it has active agents or not.
Arguably, the enableAutoscaling
property on the Pool cloud already be used for this purpose.
Any workarounds that don’t involve building our own modified version of the Horde Server, or paying for a Horde Agent EC2 instance 24/7, would be much appreciated.