Horde Artifacts transfer crashes on Internal Server Error

Hello,

our Horde builds fail a few times a day on artifacts transfer (Unhandled exception: Unable to find file) and we are unsure why because it most of the time works fine. Specifically, it is built game artifacts being transfer to benchmark machines, where a gauntlet tests will. We use Horde from 5.5 release with 5.4 engine.

Artifacts are stored locally on the same machine as Horde server.

globals json

"plugins": { "storage": { "backends": [ { "id": "default-backend", "type": "FileSystem", "baseDir": "Storage" // Default base directory is C:\ProgramData\HordeServer }, { "id": "memory-backend", // Used for automated tests "type": "Memory" } ], "namespaces": [ { "id": "default", "backend": "default-backend" }, { "id": "horde-artifacts", "prefix": "Artifacts/", "backend": "default-backend" }, { "id": "horde-perforce", "prefix": "Perforce/", "backend": "default-backend" }, // The 'horde-logs' namespace is used internally by Horde to store logs from the CI // and remote execution systems. It must be configured in order for Horde to function // correctly. { "id": "horde-logs", "prefix": "Logs/", "backend": "default-backend" }, { "id": "horde-tools", "prefix": "Tools/", "backend": "default-backend" }, { "id": "memory", "backend": "memory-backend", "enableAliases": true } ] },

stream json

"artifactTypes": [ { "type": "ugs-pcb", "keepCount": 3 }, { "type": "symbols", "keepDays": 273 } ],

Alternatively, it is sometimes happening that a build is transferred but it crashes on startup on tons of various asset serialization corruption errors. E.g.

FCompression::UncompressMemory - Failed to uncompress memory (10143/65536) from address 0000022952AFA620 using format Oodle, this may indicate the asset is corrupt!

Signature error detected in container ‘GZW-WindowsClient’ at block index ‘134444’

StartBundleIoRequests: FailedRead: None (0xF9B7A0C4D4DC4D6B) None (0xF9B7A0C4D4DC4D6B) - Failed reading chunk for package: (Read Error)

… and many other similar errors from other systems indicating that most likely the artifact transfer somehow corrupted the build.

Hey Ondrej,

This ticket must fallen through the cracks post GDC week - thank you for pinging me on this and your patience! One thing I will note here is that our 5.6 release will be around the corner, and there have indeed been further improvements and bugfixes to Storage.

I am curious whether the job that is responsible for producing the blobs is failing to upload them? Do you have any of the agent logs associated with the producer?

Kind regards,

Julian

Hey there,

Ah yes, got ya. One thing I did notice on that createartifact is that: “Base path for artifact gzwclientsteam-win64-test-symbols does not exist (D:\{PATH_REDACTED}\WindowsClient)”. Reviewing the code here, it hints that the outputNode won’t have any of the files added, and as a result, it doesn’t appear to actually write anything in the subsequent WriteBlobAsync (and I also assume AddRefAsync would fail). There is a property for Artifact tags “BasePath” that you could provide, and then make sure that this is present on the device.

When you see the issue, is it typically consistent with that basepath warning being present in the log?

Julian

> This looks like a result of mixing UE 5.4 BuildGraph scripts with Horde 5.5. I see that newer scripts use CreateArtifact tasks which can actually be placed inside a node. I will modify it and see if it is resolved.

Ah this is a great call-out and distinction - I missed that detail! There are some complex couplings between 5.4’s build graph && storage service. I’ll see if I can dig up some more details, but there was a pretty substantial push in the artifact space between 5.4->5.5, and buildgraph changes had to be made to support the new blob storage system. I think backporting these from 5.5 -> 5.4 in the artifact space may be quite complex.

Sounds good - please do let me know should you encounter the issue again on 5.5! I’d be more than happy to dig in with ya and try to figure out what’s going on.

Kind regards,

Julian

I am not sure anymore if these are related or not.

I checked on the server for existence of binary blobs and the missing blob really does not exist. It is weird because the job which failed today just builds the editor, transfers it to other agent using buildgraph and runs buildcookrun.

xy_1.blob, xy_3.blob is present but xy_2.blob is missing. I can see also in logs that this artifact is existing logically but has length -1

[06:48:07 dbg] Length of blob 6822e34a74e3e87204429636 (step-output/ftw-release-0.3.0.0/122970/compile-gzweditor-win64-and-tools/6822e1db4c46c407ed0fa1ad/181f19cbccf84c48a5e3c1956f32fcf3_2.blob): -1

We redeployed the Horde from scratch but the issue keeps happening.

Pinging [mention removed]​ in case there is some known Horde 5.5.4 issue.

Hello, second attached log is from upload. There does not seem to be any error. We plan to look into 5.6 Horde update after official release, in case it is not stable yet in the preview.

Thanks a lot again! I don’t know how I missed that… I am looking into it and the path is a bit funky.

We have a node which produces tagged files which should be uploaded to Horde as artifacts. Right below the node, outside any node, we do

<Artifact Name="$(ClientTarget)-$(ClientPlatform)-$(TargetConfiguration)-Symbols" Type="symbols" BasePath="$(ArchivedClientBuildDir)" Tag="#$(ClientTarget) $(ClientPlatform) $(TargetConfiguration) Symbols"/>It is outside any node because Artifact is not allowed to exist within a node (based on the Scheme). Setup step of Horde is running on a different machine, which has workspace on different drive letter. I believe that because Artifact is not in node, its $(ArchivedClientBuildDir) is evaluated on this setup machine with different drive letter what leads to path having D:/ instead of E:/ at the beginning.

This looks like a result of mixing UE 5.4 BuildGraph scripts with Horde 5.5. I see that newer scripts use CreateArtifact tasks which can actually be placed inside a node. I will modify it and see if it is resolved.

This leads to new error. I think at this point I will wait for our update to newer engine version which is also around the corner. Just to rule out any UE 5.4 specific issues and possibly also go for Horde 5.6.

POST https://horde.madfinger.local/api/v2/artifacts failed (InternalServerError). Delaying for 1000ms (attempt #1). POST https://horde.madfinger.local/api/v2/artifacts failed (InternalServerError). Delaying for 5000ms (attempt #2). POST https://horde.madfinger.local/api/v2/artifacts failed (InternalServerError). Delaying for 10000ms (attempt #3). POST https://horde.madfinger.local/api/v2/artifacts failed (InternalServerError). Delaying for 30000ms (attempt #4). POST https://horde.madfinger.local/api/v2/artifacts timed out after 30s. POST https://horde.madfinger.local/api/v2/artifacts retrying after 5s. POST https://horde.madfinger.local/api/v2/artifacts failed (InternalServerError). Delaying for 1000ms (attempt #1). POST https://horde.madfinger.local/api/v2/artifacts failed (InternalServerError). Delaying for 5000ms (attempt #2). POST https://horde.madfinger.local/api/v2/artifacts failed (InternalServerError). Delaying for 10000ms (attempt #3). POST https://horde.madfinger.local/api/v2/artifacts failed (InternalServerError). Delaying for 30000ms (attempt #4). POST https://horde.madfinger.local/api/v2/artifacts timed out after 30s. POST https://horde.madfinger.local/api/v2/artifacts retrying after 10s. POST https://horde.madfinger.local/api/v2/artifacts failed (InternalServerError). Delaying for 1000ms (attempt #1). POST https://horde.madfinger.local/api/v2/artifacts failed (InternalServerError). Delaying for 5000ms (attempt #2). POST https://horde.madfinger.local/api/v2/artifacts failed (InternalServerError). Delaying for 10000ms (attempt #3). POST https://horde.madfinger.local/api/v2/artifacts failed (InternalServerError). Delaying for 30000ms (attempt #4). POST https://horde.madfinger.local/api/v2/artifacts timed out after 30s. Polly.Timeout.TimeoutRejectedException: The delegate executed asynchronously through TimeoutPolicy did not complete within the timeout. ---> System.Threading.Tasks.TaskCanceledException: A task was canceled. at Polly.Retry.AsyncRetryEngine.ImplementationAsync[TResult](Func3 action, Context context, CancellationToken cancellationToken, ExceptionPredicates shouldRetryExceptionPredicates, ResultPredicates1 shouldRetryResultPredicates, Func5 onRetryAsync, Int32 permittedRetryCount, IEnumerable1 sleepDurationsEnumerable, Func4 sleepDurationProvider, Boolean continueOnCapturedContext)
at Polly.AsyncPolicy1.ExecuteAsync(Func3 action, Context context, CancellationToken cancellationToken, Boolean continueOnCapturedContext)
at Microsoft.Extensions.Http.PolicyHttpMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
at Microsoft.Extensions.Http.PolicyHttpMessageHandler.SendCoreAsync(HttpRequestMessage request, Context context, CancellationToken cancellationToken)
at Polly.Timeout.AsyncTimeoutEngine.ImplementationAsync[TResult](Func3 action, Context context, CancellationToken cancellationToken, Func2 timeoutProvider, TimeoutStrategy timeoutStrategy, Func5 onTimeoutAsync, Boolean continueOnCapturedContext) --- End of inner exception stack trace --- at Polly.Timeout.AsyncTimeoutEngine.ImplementationAsync[TResult](Func3 action, Context context, CancellationToken cancellationToken, Func2 timeoutProvider, TimeoutStrategy timeoutStrategy, Func5 onTimeoutAsync, Boolean continueOnCapturedContext)
at Polly.AsyncPolicy1.ExecuteAsync(Func3 action, Context context, CancellationToken cancellationToken, Boolean continueOnCapturedContext)
at Polly.Wrap.AsyncPolicyWrapEngine.<>c__DisplayClass2_01.<<ImplementationAsync>b__0>d.MoveNext() --- End of stack trace from previous location --- at Polly.Retry.AsyncRetryEngine.ImplementationAsync[TResult](Func3 action, Context context, CancellationToken cancellationToken, ExceptionPredicates shouldRetryExceptionPredicates, ResultPredicates1 shouldRetryResultPredicates, Func5 onRetryAsync, Int32 permittedRetryCount, IEnumerable1 sleepDurationsEnumerable, Func4 sleepDurationProvider, Boolean continueOnCapturedContext)
at Polly.AsyncPolicy.ExecuteAsync[TResult](Func3 action, Context context, CancellationToken cancellationToken, Boolean continueOnCapturedContext) at Polly.Wrap.AsyncPolicyWrapEngine.ImplementationAsync[TResult](Func3 func, Context context, CancellationToken cancellationToken, Boolean continueOnCapturedContext, IAsyncPolicy outerPolicy, IAsyncPolicy1 innerPolicy) at Polly.AsyncPolicy1.ExecuteAsync(Func3 action, Context context, CancellationToken cancellationToken, Boolean continueOnCapturedContext) at Microsoft.Extensions.Http.PolicyHttpMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken) at EpicGames.Horde.HordeHttpAuthHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken) in E:\Horde\FTW_Rel03_Inc\Sync\Engine\Source\Programs\Shared\EpicGames.Horde\HordeHttpAuthHandler.cs:line 54 at Microsoft.Extensions.Http.Logging.LoggingScopeHttpMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken) at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken) at EpicGames.Horde.HordeHttpClient.PostAsync[TRequest](HttpClient httpClient, String relativePath, TRequest request, CancellationToken cancellationToken) in E:\Horde\FTW_Rel03_Inc\Sync\Engine\Source\Programs\Shared\EpicGames.Horde\HordeHttpClient.cs:line 461 at EpicGames.Horde.HordeHttpClient.PostAsync[TResponse,TRequest](HttpClient httpClient, String relativePath, TRequest request, CancellationToken cancellationToken) in E:\Horde\FTW_Rel03_Inc\Sync\Engine\Source\Programs\Shared\EpicGames.Horde\HordeHttpClient.cs:line 476 at AutomationTool.Tasks.CreateArtifactTask.ExecuteAsync(JobContext Job, HashSet1 BuildProducts, Dictionary2 TagNameToFileSet) in E:\Horde\FTW_Rel03_Inc\Sync\Engine\Source\Programs\AutomationTool\BuildGraph\Tasks\CreateArtifactTask.cs:line 126 at AutomationTool.Tasks.CreateArtifactTask.ExecuteAsync(JobContext Job, HashSet1 BuildProducts, Dictionary2 TagNameToFileSet) in E:\Horde\FTW_Rel03_Inc\Sync\Engine\Source\Programs\AutomationTool\BuildGraph\Tasks\CreateArtifactTask.cs:line 149 at AutomationTool.BgScriptNodeExecutor.ExecuteAsync(JobContext Job, Dictionary2 TagNameToFileSet) in E:\Horde\FTW_Rel03_Inc\Sync\Engine\Source\Programs\AutomationTool\BuildGraph\BgNodeExecutor.cs:line 380
at AutomationTool.BuildGraph.BuildNodeAsync(BgGraphDef Graph, BgNodeDef Node, Dictionary2 NodeToExecutor, TempStorage Storage, Boolean bWithBanner) in E:\Horde\FTW_Rel03_Inc\Sync\Engine\Source\Programs\AutomationTool\BuildGraph\BuildGraph.cs:line 1076 at AutomationTool.BuildGraph.ExecuteAsync() in E:\Horde\FTW_Rel03_Inc\Sync\Engine\Source\Programs\AutomationTool\BuildGraph\BuildGraph.cs:line 689 at AutomationTool.Automation.ExecuteAsync(List1 CommandsToExecute, Dictionary2 Commands) in E:\Horde\FTW_Rel03_Inc\Sync\Engine\Source\Programs\AutomationTool\AutomationUtils\Automation.cs:line 270 at AutomationTool.Automation.ProcessAsync(ParsedCommandLine AutomationToolCommandLine, StartupTraceListener StartupListener, HashSet1 ScriptModuleAssemblies) in E:\Horde\FTW_Rel03_Inc\Sync\Engine\Source\Programs\AutomationTool\AutomationUtils\Automation.cs:line 164
while executing task
at E:\Horde\FTW_Rel03_Inc\Sync\Game\Build\Graph\V2\BuildProject.xml(537)
(see E:\Horde\FTW_Rel03_Inc\Sync\Engine\Programs\AutomationTool\Saved\Logs\Log.txt for full exception trace)
AutomationTool executed for 0h 2m 5s`

No problem, we can close this for now and I will create new ticket in case this issue persists after migrating to UE 5.5. Thank you!