Horde dependency build chains?

anonymous-edc · May 9, 2025, 5:09pm

Greetings!

Does the Horde build system have a mechanism for a dependency build chain? For example:

Say I have job Z. Before job Z can start, it needs jobs A and B to run and complete. Jobs A and B can run in parallel, and once complete, job Z would then start, possibly at a changelist higher than jobs A and B if those jobs submitted things that job Z needs to sync.

So ideally job Z would trigger on a schedule, then it would check which job dependencies it has, trigger those jobs (A and B), and then wait for both of them to complete before starting.

I see that TemplateRefConfig has chainedJobs, but this seems like it would only work for chaining things more serially and wouldn’t cover the scenario I mentioned above, since then job A and job B (running parallel) would both have Job Z for their chainedJobs, but that would lead to Job Z running as soon as one of them completes, rather than both, and likely even running twice.

BuildGraph kind of has this dependency chain concept with the Requires tag, ensuring anther Node runs and completes before that node runs. Though the issues here seem to be:

Not sure if you can have one Node run at a higher CL than the others
Using BuildGraph loses the modularity we have with the Horde templates, as a new BuildGraph would need to be created for every combination of a chained dependency we may want.

The next feature I came across that seemed promising is ScheduleConfig’s ScheduleGateConfig. At first glance it seemed like this would do what I wanted (though is currently limited to only 1 templateID). It seemed like it would wait for the job specified in the ScheduleGateConfig to complete, and if successful, would then run the scheduled job. However upon digging deeper, it seems like it doesn’t trigger/wait for the gate job, but rather just searches for the last successful CL of the gate job, and then runs the scheduled job on that matching CL?

As of now it seems like my best approach might be to write custom functionality similar to ScheduleGateConfig that will meet the criteria I listed above in my sample scenario, but before going down that path I wanted to double check here to see if there was a feature I might’ve missed that already handles what we need. Thanks for your time!

JulianGamble · May 12, 2025, 3:47pm

Hey there,

As you’ve highlighted, BuildGraph really manages this through tags in the intra changelist perspective.

To handle this in an inter changelist-job perspective, there are two different mechanisms that you can explore.

Writing a custom buildgraph task to issue a request for a new job (we do this internally)
1. This question has come up in the past, and you’d need to write a buildgraph task that will poke the jobs API in horde to kick of the new job
Chained jobs (config details, upstream reference)
1. This one could work for you, as you can specifically indicate to use the latest changelist

At a certain extent, you will likely need to write some custom tasks for buildgraph, so juts leaving the relevant document link for posterity.

Let me know if you have any further questions, always happy to help/collaborate.

Kind regards,

Julian

JulianGamble · May 12, 2025, 11:05pm

Hey there Joshua,

Apologies - I must have missed that constraint where you’re not wanting to have serial execution; my mistake, however I’d suggest a following possible change to make this work, but if you have a lot of combinations you’re trying to manage, then this idea would fail:

Model A & B within the same job template
- Make sure they’re in separate <agent> tags - parallelism will be obtained via Horde as a result.
Have job C be chained with the first

Now, should this still not be feasible, yes the manual invocation via rest API is an option. To your point, the node will “fire and forget”. You could write your task such that you query the resulting job to see for completion, but you’d effectively be holding that agent until a result is arrived at (and pending the duration of that job, it may not be advisable to do so!).

Let me know what you’re thinking on this - happy to keep brain storming.

Kind regards,

Julian

JulianGamble · May 13, 2025, 10:11pm

Hey there,

> So correct me if I’m mistaken, but it looks like ScheduleGateConfig works by checking if the job specified in the Schedule Gate has ran successfully. If so, it finds the most recent success, and then the scheduled job that defined the ScheduleGateConfig runs with the matching CL of that gated job?

That is correct - by my reading of the source and looking at our own schedule gate config.

> So what I’m thinking for custom functionality is I’ll have an option…

I’m a bit confused, mostly due to the labelling hah!
When we say Schedule Gate job’s recent success - do we mean the dependency jobs A & B? With job Z being the dependent?
The schedule gate logic is simply a part of the schedule service - and all it’s doing is querying the job listed in the Gate to find it’s changelist (if it’s successful);
- https://github.com/EpicGames/UnrealEngine/blob/5\.5/Engine/Source/Programs/Horde/Plugins/Build/HordeServer.Build/Jobs/Schedules/ScheduleService.cs\#L419
Regarding checking for related job completion
- IMO things can get out of hand quickly here if you’re wanting to stitch together results (such that A & B potentially link to the Z job?)
- This is the one aspect of the idea that I’d personally be a bit cautious around, as it adds a whole level of complexity
  - Should the dependencies be aware of the dependent? Should the dependencies go RED if the dependent fails?

Other than the last item, I can see the changes being minor to make this work.

Kind regards,

Julian

JulianGamble · May 14, 2025, 8:39pm

> Otherwise we will trigger it on the change the schedule originally triggered on.

I think this is the only real departure from the existing behaviour, wherein if we can’t find a changelist of at least X - we just abandon the schedule. You may want to consider maintaining that behaviour unless there’s a specific reason.

Otherwise, I think this makes sense to me. Once you’ve got a change, going feel free to share it back here or consider forming a PR, as it could be something the team may want to integrate back as a first class feature (I can certainly think of at least the N gate requirement being something they’d review).

Let me know if you need a hand with anything.

Kind regards, and good luck!

Julian

anonymous-edc · May 12, 2025, 5:19pm

Thanks for the reply Julian!

>Chained jobs (config details, upstream reference)

> This one could work for you, as you can specifically indicate to use the latest changelist

The chained job useDefaultChangeForTemplate bool is one piece I’m looking for in our scenario, but unfortunately chained jobs won’t work in this scenario based on the reasons mentioned in my original post. Also thinking about it more now, we’d also only want this dependency chain to run when the job is run on a schedule (like ScheduleGate does) and not anytime a job might be manually kicked off.

>Writing a custom buildgraph task to issue a request for a new job (we do this internally)

> This question has come up in the past, and you’d need to write a buildgraph task that will poke the jobs API in horde to kick of the new job

This does raise an interesting idea of creating a generic build graph where you pass in all jobs that need to be kicked off, which jobs each one depends on, etc, and it dynamically builds the nodes. A horde job template would still need to be created for every dependency chain you want to build, but at least it still takes advantage of the modularity of each jobs already defined template that it would use, and you wouldn’t have to create a new bespoke build graph each time.

Though my question would be: if you have a build graph node that kicks off a new job using the horde api, is there a way for that node to only complete after that kicked off job has completed (and ensuring it was successful)? Or is the node always going to just complete after it has kicked off the job?

anonymous-edc · May 13, 2025, 9:14pm

Yeah that’s a really good point about that agent being held hostage while waiting for the other job to complete, which wouldn’t be ideal. I suppose it would make more sense for the server to be managing this, which brings me back to what I was originally thinking: adding some custom functionality that worked similar to ScheduleGateConfig.

So correct me if I’m mistaken, but it looks like ScheduleGateConfig works by checking if the job specified in the Schedule Gate has ran successfully. If so, it finds the most recent success, and then the scheduled job that defined the ScheduleGateConfig runs with the matching CL of that gated job?

So what I’m thinking for custom functionality is I’ll have an option, that instead of just checking for the Schedule Gate job’s most recent success, it can kick off the Schedule Gate job(s) at the CL of the scheduled job. If the Schedule Gate logic isn’t already in a separate thread, I’ll move that, so then it can wait there and check every so often if that job has finished without blocking anything else on the server. Once the schedule gate job has finished, I’ll have added additional options that we can check to see what we now want to do with the scheduled job. So we can decide if we want to run the scheduled job based on if the gated job(s) succeed or fail. Similar to chain jobs useDefaultChangeForTemplate, we can decide if we want the scheduled job to run on the same CL as the gated job(s) or if we want it to use the latest CL at that time to get new changes the gated job(s) may have submitted.

Is there anything obviously wrong with that approach that you can think of?

And thanks again for the collaboration!

anonymous-edc · May 14, 2025, 6:07pm

Haha confusing terminology for sure! Let me see if I can break it down more clearly. So this is the current ScheduleGateConfig definition:

``templateId|string The template containing the dependency target|string Target to wait forI’m thinking of changing it to something like the following:

``templateId|string The template containing the dependency target|string Target to wait for continueOnFail|bool Whether to still run the scheduled job if the dependency fails useLatestChange|boolean Whether to use the latest change for the scheduled job, or the same change as the dependency job.Then in ScheduleConfig, instead of having

``gate | [ScheduleGateConfig](#schedulegateconfig) Gate allowing the schedule to triggerI would have

``gates | [ScheduleGateConfig](#schedulegateconfig)[] Gates allowing the schedule to triggerso that we can have multiple gate dependencies.

So then in my example, we have job Z, which will have job A and job B as gates. Then in code logic, I’d do something like the following on the server when job Z is triggered by schedule:

For each gate in gates, trigger those gate jobs (jobs A and B)
Continue to check if gate jobs have all completed. If not, continue to sleep and wait for them to complete
Once all jobs (A & B) are completed, check their outcomes, and compare each one to their gates respective continueOnFail parameter.
- If any gate job fails and their gate continueOnFail is set to False, we fail job Z. Otherwise we will trigger job Z.
Check each gates useLatestChange. If any of them are true, we will trigger job Z on the latest change. Otherwise we will trigger it on the change the schedule originally triggered on.

Hopefully my explanation of potential changes makes more sense now