Horde job timeout

Hello,

is there something like a timeout for Horde jobs? Sometimes when there are issues, jobs remain running on agents forever and we may notice only days after. I understand that underlying scripts often have own timeouts like Gauntlet but it would be nice to also have some safety net.

Thank you!

Hey there!

Thanks for the question - it’s a good real-world use case for sure. This has been under discussion internally for the time being, as there are many edge cases in how this integrates with some other subsystems. Whilst I can’t speak to when or how this would be implemented, if you’d like to make a local divergence in your Horde server, it should be relatively direct to do so.

For the source code at JobExecutor.cs

using CancellationTokenSource combined = CancellationTokenSource.CreateLinkedTokenSource(cancellationToken, stepCancellationToken); try { combined .CancelAfter(TimeSpan.FromSeconds(SOME_LONG_DURATION)); JobStepOutcome stepOutcome = await RunAsync(step, stepLogger, combined.Token); return (stepOutcome, JobStepState.Completed); }There are of course serious limitations to this in that you’ll need to answer what is a meaningful duration.

At least with this you should be able to cancel a step should it hang for some extended duration.

Kind regards,

Julian

Thanks a lot Julian for considering the request and also providing possible solution. I will look into implementing it.