Horde ComputeQueueAwsMetricStrategy and Scaling UBA Pools

anonymous-edc · May 5, 2025, 2:26pm

Hi!

We are looking for recommendations around the autoscaling of UBA compute pools.

We are currently using the LeaseUtilizationAwsMetric, but using that metric requires that at least one compute agent is left running in a pool at all times. Without that agent the pool can never see any utilization (since utilization would always be zero).

It looks like the ComputeQueueAwsMetric may have been intended to help with this, however the core business logic throws a NotImplementedException.

Our goal would be able to have zero agents running in our UBA compute pools until UBA compute is requested.

Do you have any idea/suggestions on other ideas to explore or do you know if a complete implementation for ComputeQueueAwsMetric is expected?

Thanks!

cgbystrom · May 6, 2025, 1:34pm

Hi Lucian,

For UBA we primarily do reactive scaling of an auto-scaling group containing our agents, based on CPU utilization. For UBA, we never let our pools scale to zero. For normal, job-based agents we do, but that uses the JobQueueStrategy, which is proactive. We want to make compute/UBA more proactive, having it look at a compute queue for example. Like what ComputeQueueAwsMetric does. Our goal is not to have logic for starting/stopping EC2 instances so that strategy is on purpose not implementing that, but rather emitting metrics for AWS to take action (or any cloud provider).

Short answer: the utilization we get from reactive CPU scaling is okay. It’s possible optimizing things at a scheduling level within Unreal Build Tool or Unreal Build Accelerator yields better returns than pure proactive scaling at a EC2 level.

anonymous-edc · May 6, 2025, 1:41pm

Hi Carl,

We are already using AWS metrics to handle autoscaling. We’d love to use ComputeQueueAwsMetric, but as I mentioned it throws a NotImplementedException, so it never emits any metrics.

Are there any plans to offer a fully-implemented ComputeQueueAwsMetric?

Lucian

cgbystrom · June 11, 2025, 3:41pm

No, not at the moment. But if you are interested in reviving the implementation, I’d be happy to review a pull request.