Inconsistent BuildGraph execution on Horde vs locally in regards to errors and exceptions

Hey

We have noticed two confusing things about BuildGraph execution, one in regards to exceptions in BuildGraph Tasks and one in regards to running them on Horde vs locally, and wanted to make sure we’re not missing anything

For the exception, we noticed that if a BuildGraph task throws an exception, BgNodeExecutor will log it and re-throw, and UAT execution of the node will end. This seems intentional as some tasks have an `_parameters.ErrorIfNotFound` parameter to decide if they should log information or throw an error. This creates a problem though in cases where we want the task to log an error, so Horde can create issues and triage from it, while still not stopping execution of the node. To support this we had to make some changes, for example on the CommandTask we added an ErrorLevel parameter so `RunUAT` will check Result.ExitCode <= ErrorLevel to throw an exception or not.

You can see this on the example BuildGraph I provided, where Node NoException will log an error but execute all tasks, while node Exception will stop execution at the exception

But this brings us to the other weirdness which is that due to Horde’s IssueHandlers, BuildGraph nodes that log errors get a Failed Outcome, which makes required nodes not execute, and that’s different behavior than when running locally. So again in the example BuildGraph, if you run it locally, Nodes 1 2 and 3 will execute with node 4 being skipped due to the failure by exception. But when running in Horde only Nodes 1 and 3 will execute, with both 2 and 4 being skipped, as node 1 is also marked as a Failure from the IssueHandlers

So the issues we have are:

  • What would be the expected way to handle such dependencies when we expect tasks to log errors? We noticed the OptionalRequired option is coming which sounds like will fix this issue somewhat, as while exceptions will still stop node execution, at least we’ll be able to separate tasks in more nodes that will execute regardless of failure
  • Have some way to better simulate Horde’s execution locally, so a way for local UAT running an entire buildgraph to respect outcome failures like Horde would

Thanks in advance!

ExceptionHandling.xml(1.14 KB)

Steps to Reproduce
If you run the BuilGraph in the example locally, Nodes 1 2 and 3 will execute, with node 1 having a parsed error from the error structured log. Also in node 1 all tasks execute, while in node 3 execution stops at the exception

But if you run the same BuildGraph in Horde, only nodes 1 and 3 will execute as both will have a failed outcome, breaking the deperency

Hello,

By design in BG/UAT, tasks within a node execute sequentially and any exception stops all remaining tasks. The ErrorLevel parameter on Spawn tasks is the intended way to control whether non-zero exit codes throw.

Horde vs local differences, is caused by Horde’s log parsing pipeline:

  • When the Horde agent runs a step, all process output is piped through LogParser detecting lines containing “Error:”
  • Sets the step outcome to Failure based on any Error-level log events
  • Dependent nodes check predecessor outcomes and skip if any required dependency has Failure outcome
  • This pipeline does not exist in local execution, where only exceptions/exit codes determine success

This means a spawned process that outputs text containing “Error:” will cause the Horde step to get a Failure outcome even if the exit code is 0 and no exception is thrown. Locally, the same scenario succeeds.

The OptionalRequires attribute can help with this:

  • Nodes expected to log errors can be declared as optional dependencies
  • On Horde: The node still gets a Failure outcome from log parsing, issues are still created, but dependent nodes using OptionalRequires still execute
  • On local: can opt-in via a command line switch to get a similar behaviour optional-only dependency failures don’t stop the build. This won’t be an exact one to one mapping with Horde due to the different nature how Horde and UAT enumerate the graph and separates execution of each node
  • The dependent node’s execution decides how to handle potentially missing inputs from allowing errors to continue execution

Example

  <Node Name="NoException">
      <Spawn Exe="cmd.exe" Arguments="/c echo Error: This is an error message"/>
  </Node>
  <Node Name="DependentOnNoException" OptionalRequires="NoException">
      <Log Message="This will execute in BOTH local and Horde"/>
  </Node>

What OptionalRequires doesn’t change

  • Exceptions still stop all remaining tasks within a node
  • The log-parsing difference between local and Horde still exists (local has no LogParser)
  • After dependencies still cause skipping on Failure outcome

You could also try suspending log parsing, which won’t set the job step to Failure. Output these as literal text in your process stdout

  <-- logeventparser suspend GenericEventMatcher -->
  ... lines that contain "Error:" but shouldn't be treated as errors
  <-- logeventparser pop -->

Though this would also stop creating Issues for these lines in Horde.

Matthew

Hey Matthew, thanks for the response

Sounds good, then it sounds like we’ll switch to OptionalRequires when that’s in.

For not throwing on non-zero error codes, we’ve copied SpawnTask’s ErrorCode parameter to other places like CommandTaks in order to run commandlets and control execution depending on error code, but we can instead spawn an editor-cmd through the SpawnTask and handle all such cases like that. It might end up we use SpawnTask a lot more in our buildgraphs like that as we often want this errorCode control, but it should work

Regarding After we also use it for example to run tests on a revision, either after a build finishes publishing if a publish node is requested, or immediately if a publish node is not requested (so the build already exists). So having an OptionalAfter might also help, but for now we’ve started switching to using chained jobs and gates instead to handle such cases so I think we should be good with OptionalRequires on all our use cases

Thanks for clearing things up!