Horde Agents deployed using an AWS ASG always appear as "Offline - Unexpected" when scaled by the Fleet Service

Hi Team!

We are using Horde 5.5 with our Agents deployed in Autoscaling groups, using the AwsAsgFleetManager class.

We have connected the ASG on the AWS side to use a lambda that uses the Horde API endpoint api/v1/aws/asg-termination-policy to determine which instances are eligible for shutdown.

The issue we are seeing is that in the Horde Agents status page on the main Horde UI, all of the agents report “Offline - Unexpected” as their status, after being scaled in by Horde.

The expectation is that the agents would have a status of “Offline - Autoscaler”

Is there some configuration that we are missing that’s required to set this up?

Hey there,

Thanks for your question - this is a system that I’m not exactly clear on. Would it be possible to get a more complete log here from the server (and at best, one of the Autoscaled Agents)? Just reviewing the TryRequestShutdownAsync code here, we should be able to get a little more context as to what’s going on (either by retry exhaustion or the agent no longer existing) - as this is the call site that seems to be setting the Offline - Autoscaler state.

Further reading of the TryCreateSessionAsync, it looks like unexpected is the default state once we are creating a new session - so I wonder if these are not properly getting Shrunk.

Kind regards,

Julian

Thank you for these additional details above. I’ve spoken briefly with the SME in this area about this, and this was his initial take:

“… IIRC you need to configure the Horde server to get events from the ASG through SQS. This calls InitiateTerminationAsync in AwsAsgFleetManager to gracefully request agents in question to shutdown. I’m not sure if we did something with the ASG to prevent it from eagerly trying to terminate, and simply wait for them terminate themselves. If you just let the ASG kill your instances, you most likely get the Offline - unexpected”.

The SME will be out of office until next week, so I will put this in pending and assign over to them in the interim.

Kind regards,

Julian

Hey Julian,

Here are the logs you requested.

And the logs from an agent that was in that state before being woken up.