Horde ASG/Fleetmanager

Hi, would it be possible to get some clarity on best practises or general help for auto scaling in AWS.

https://github.com/EpicGames/UnrealEngine/blob/5.5/Engine/Source/Programs/Horde/Docs/Config/Schema/Globals.md#poolconfig

I don’t fully understand the fleet manager options, it seems like you were handling scaling with code but also the option exists to use an ASG in AWS natively. My assumption is that horde will expose the required metrics to cloudwatch so ASG can scale. My use case at present is just UBA and my ideal is to use spot instances for this pool or most of it

I started on what I think config might look like and there are two main questions

1) What is “config” for external ASG fleet?

2) Are these bits relevant for external ASG?

AwsFleetManager.cs

public const string PoolTagName = “Horde_Autoscale_Pool”;

private const string AwsTagPropertyName = “aws-tag”;

Thanks

`{
“compute”: [
{
“id”: “default”
//“condition”: “pool == ‘Linux-UE5’ && osfamily == ‘Linux’”

    }
],
"pools": [
    {
        "name": "Linux-UE5",
        "condition": "Platform == 'Linux'",
        "color": "Orange", 
        "enableAutoscaling": true, // turn on scaling
        "computeQueueAwsMetricSettings": {
            "computeClusterId": "default", // compute cluster ID from above
            "namespace": "Horde/QueueManagement" // seems sensible?
        },
        "fleetManagers": [
            {
                "type": "AwsAsg", // I want to use an existing ASG
                "condition": "true", // enable it
                "config": "" // not sure what goes here
            }
        ],
        "sizeStrategy": "LeaseUtilizationAwsMetric"  //assume we use sizeStrategy and not sizeStrategies
    }
]

}`

Hi,

We do use AWS completely for our autoscaling of UBA Helpers for our internal use case.

Your example config is close to what we have, I’ll provide some snippets and some further context on how we do it.

Our server compute entries look something like this (ignoring the acl configuration). Some of this will be 5.6 centric with the configuration changes to allow ini entires to define providers, but most should be usable with 5.5

{ "id": "buildfarmm", "namespaceid": "horde.compute", "condition": "(pool == 'UBAMac' && osfamily == 'MacOS') || pool =='UBALinux'", }We divide up our helper pools between the farm and end users so we have a lot of compute entries with unique ids. We use the compute cluster info to map CIDR blocks to compute clusters as well. That’s the reasoning we have a lot of compute entries.

For our pool definitions they are pretty bare bones:

{ "id": "ubalinux", "name": "UBALinux", "colorValue": "#58c85c", "enableAutoscaling": true, "sizeStrategies": [ { "type": "LeaseUtilizationAwsMetric", "config": {}, "extraAgentCount": 0 } ], "fleetManagers": [], "sizeStrategy": "LeaseUtilization", "workspaces": [], "properties": { "Color": "148" } }LeaseUtilizationAwsMetric is key to push data to AWS CloudWatch as you mention. From there our ASG looks at the cloud watch data to scale up/down based on lease utilization over sliding windows. We are aggressive for scale up ie: 1min window, and less aggressive for scale down 5min window.

Some of the next bits are more how we have our internal automation in place for ASG / agent deployment to AWS. We leverage terraform + ansible for our automation & configuration.

We have our internal AMI/Automation for the deployment expand out a templatized userdat.sh script that setups up the systemd service and installs HordeAgent that eventually sets an envvar for Horde__Properties__RequestedPools=<POOLSTOJOIN>". So on boot it joins the correct pool

So, we don’t use the Horde_Autoscale_Pool tag. Since we don’t let horde scale up/down. We also enable the lifetime management flag, I believe it’s called, on the ASG to be alerted to the termination, HordeAgent and UBA will pay attention to that info to stop helping when the termination warning signal is set.

Hopefully that provides some helpful information!

-Ryan

Hi Ryan, thanks for the info, its very useful. I’ve updated my config and think it should be set up to use AWS ASG natively but I’ve currently got 2 problems

1) Agents deployed by ASG aren’t being auto-enrolled, I think the only requirement is enableNewAgentsByDefault in server.json or equivalent EV. Maybe there is another approach to join the new ASG agents - pre-populated enrolment token maybe??

https://server/api/v1/debug/environment

Horde__enableNewAgentsByDefault=true 2) I don’t see any metrics in cloud watch. Config as below

https://server/api/v1/pools/linux\-UE5

{ "id": "linux-ue5", "name": "Linux-UE5", "condition": "Platform == 'Linux'", "colorValue": "#ff5a00", "enableAutoscaling": true, "sizeStrategies": [ { "type": "LeaseUtilizationAwsMetric", "config": "", "extraAgentCount": 0 } ], "fleetManagers": [], "sizeStrategy": "LeaseUtilization", "computeQueueAwsMetricSettings": { "computeClusterId": "default", "namespace": "Horde" }, "workspaces": [], "autoSdkConfig": { "view": [] }, "properties": {} }Horde itself is setting name to “Horde” but I can’t find in cloudwatch. I see nothing in Horde logs to indicate its failing to do something and I’ve yet to get cloudtrail to log cloudwatch events to rule permissions out that way. I think I could do with some pointers where to troubleshoot both. Thanks

Hey Chris,

1) We start UBA-based agents with the following:

Environment="Horde__Name=$INSTANCE_ID" Environment="Horde__PerforceExecutor__RunConform=false" Environment="Horde__ShareMountingEnabled=false" Environment="Horde__WorkingDir=$WORKING_DIR" Environment="Horde__Properties__RequestedPools=$POOLS_TO_JOIN" Environment="Horde__Ephemeral=true" Environment="Horde__EnableAwsEc2Support=true" ExecStart=$HORDE_AGENT_DIR/HordeAgent service run -Server=<PROFILE><PROFILE> is a server profile existing in the appsettings.json file the Horde agent loads.

appsettings.json example with token:

{ "Horde": { "EnableAwsEc2Support": true, "ServerProfiles": [ { "Name": "MyEnv", "Environment": "myenv", "Url": "https://horde-server.mystudio.com", "Token": "JWT TOKEN", } ] } }

2)

computeQueueAwsMetricSettings set on the pool itself is not in use any more, but it has unfortunately not been removed. I just addressed that, thanks for pointing that out.

Instead, it exists as a size strategy to help feed AWS with details needed. Similar to what Ryan posted.

For lease utilization, there’s usually no extra configuration needed. It uses the “Horde” namespace inside CloudWatch by default.

`…
“sizeStrategies”: [
{
“type”: “LeaseUtilizationAwsMetric”,
“config”: {},
“extraAgentCount”: 0
}
],

“fleetManagers”: [{ “type”: “NoOp”, “config”: {} }],
…`But do remember to enable AWS for the Horde server if you haven’t already.

If using env vars, use:

Horde__Plugins__Compute__WithAws = trueFor JSON based, use:

{ "Horde": { "Plugins": { "Compute": { "WithAws": true } } } }See if you can use that to help connect UBA and lease utilization metric. If not, let me know.

Hi Karl, thanks for the info.

I hadn’t set

Horde__Plugins__Compute__WithAws = true

Got caught by

Engine\Source\Programs\Horde\Plugins\Compute\HordeServer.Compute\Aws\AwsCloudWatchMultiplexer.cs

“At least one client must be specified”

Then set

Horde__Plugins__Compute__AwsRegions__0

All good now

I rather naively just assumed that Horde would know if was running on AWS through availability of default credentials. Got metrics now, should be able to get scaling working properly

Thanks for your help

Great! We require manual activation as AWS-related code is activated early during startup if enabled.