Horde Analytics total job duration time

Hi!

We have are working on an implementation of Studio Telemetry and have created some custom metrics graphs within the Horde Analytics Dashboards. For Horde job telemetry, we can currently track each step’s duration within a job, which has been very useful. However, I was looking into a way of tracking the full job duration in Horde but couldn’t find anything, as I think it would be very useful in providing a higher level overview of job health. I have thought of summing up all the steps durations, but the issue in that is that some steps may be working in parallel, there could be account delays that can occur due to resource contentions, etc., which won’t result in a very accurate total job duration.

Is there a way to track this total job duration / are there plans on implementing something like this in the future for Horde telemetry?

Hey there,

There is no built in way to do this at the moment, and it has come up several times for folks to be able to see the wall time of a Horde job, and as you’ve indicated there are challenges in determining what that would. This is certainly something I could look at, but I can’t speak to when we would be able to offer a timeline on it.

One thing I’d probably mention/poke at is, what’s the goal here? Whilst it is difficult to account for end to end time (merely due to resource contention), I think there are still some valuable data to be learned on summed step time - agent compute time used for a job. Furthermore, an end to end time inclusive of resource contention waits is still insightful, as it shows the “user perceptible impact”. With these two metrics, I think you can generally answer a couple of key questions like:

  • Is a job agent-expensive from a summed step perspective
  • Is a job taking long for a P95 metric?

If these types of metrics sound reasonable, I’m happy to add them to my backlog or review a PR.

Kind regards,

Julian

Hey there,

Just circling back to this after some unrelated research on my end regarding our Debug APIs:

  • /api/v1/debug/job-timings

This could be helpful should we choose to push such data up as a relatively unified approach.

Kind regards,

Julian

Hi!

Yes, these types of metrics sound reasonable. We are interested in getting these metrics some time down the line in the future, so it would be great if you could add it to the backlog! We might look into getting it if we really want to get it now, but I’m not sure when we will get to it.

Thanks for the info and support!

Happy to help :slight_smile: For posterity, I’ve added both these metrics to my backlog.

Kind regards,

Julian