100% Failure when transferring Horde Artifact Blobs to Horde Server

Hi Epic,

So I had identical setup since 5.5 however upon upgrading Horde Server + Horde Agent to 5.6.0.0 I am having persistent failure when uploading blobs.

Copied 949 files (23073.7mb, 3.3mb/s) Copied 950 files (23240.0mb, 31.1mb/s) Exception while executing step: EpicGames.Horde.Storage.StorageException: PUT to <my server website here>/api/v1/storage/horde-artifacts/blobs/step-output/main/868/compile-editor/683c12d06854c93327cb652e/4a2001d63e6f447686bdfd79d8359b4e_2 failed (InternalServerError). Response: {"time":"2025-06-01T09:08:21","level":"Error","message":"Unhandled exception: Command findAndModify failed: Plan executor error during findAndModify :: caused by :: E11000 duplicate key error collection: Horde.Storage.Blobs index: ns_1_blob_1 dup key: { ns: \"horde-artifacts\", blob: \"step-output/main/868/compile-editor/683c12d06854c93327cb652e/4a2001d63e6f447686bdfd79d8359b4e_2\" }.","format":"Unhandled exception: {Message}","properties":..........uploaded full log.

Setup :

Horde Server 5.6.0 compiled from source and deployed via docker image

Authentication Type : Horde

1 Agent Enrolled using Horde Agent 5.6.0

All on-premises setup

Worked quite stably in 5.5.0 with same setup. It seems to be failing quite intermittently, on any node on build graph e.g. sometimes on compile tools / compile editor etc. But have 100% error rate

Thank you.

Kind Regards,

Abhishek Sagi

Hey there Abhishek,

Sorry to hear you’re having some challenges in upgrading to 5.6.

Slightly confused by this messaging:

  • It seems to be failing quite intermittently, on any node on build graph e.g. sometimes on compile tools / compile editor etc. But have 100% error rate
  • Is it intermittent, or is it a 100% error rate?

Is it possible to run a test where you provide a clean Mongo DB (startup configuration Horde__MongoDatabaseName: CleanTestDB)? Can you also send over a more complete log of the above? I am curious if there are earlier blob uploads that perhaps were deferred + retried?

Julian

Hey there Abhishek,

I’ve spoken with the team a bit on this issue, and have looked internally at some of our job history.

This issue has come up in the past, and has not been an issue in that it doesn’t appear to be a fatal outcome. We have some hypothesis (thanks for the sanity check [mention removed]​ ):

  1. An upload is retried? This call is not idempotent
  2. There’s truly a duplicated blob due to content itself
  3. There’s a bug in the step-output handling
  4. This outcome can happen by-design but just not handled

We feel as though it could be 1 and/or 4 as the most plausible outcome here.

One thing you could try here is the following divergence within AddBlobAsync (StorageService.cs):

`internal async Task AddBlobAsync(NamespaceId namespaceId, BlobLocator locator, IReadOnlyCollection imports, List? exports = null, CancellationToken cancellationToken = default)
{
List importIds = await FindOrAddShadowBlobsAsync(namespaceId, imports, cancellationToken);

FilterDefinition insertFilter =
Builders.Filter.Eq(x => x.NamespaceId, namespaceId) &
Builders.Filter.Eq(x => x.Path, locator.ToString()) &
Builders.Filter.Eq(x => x.Shadow, true);

UpdateDefinition insertUpdate = Builders.Update
.Unset(x => x.Shadow);

if (importIds.Count > 0)
{
insertUpdate = insertUpdate.Set(x => x.Imports, importIds);
}
if (exports != null && exports.Count > 0)
{
insertUpdate = insertUpdate.Set(x => x.Aliases, exports);
}

// JULIAN_DIVERGENCE_START
BlobInfo? blobInfo;
try
{
blobInfo = await _blobCollection.FindOneAndUpdateAsync(insertFilter, insertUpdate, new FindOneAndUpdateOptions<BlobInfo, BlobInfo> { IsUpsert = true, ReturnDocument = ReturnDocument.After }, cancellationToken);
}
catch (MongoWriteException ex) when (ex.WriteError.Category == ServerErrorCategory.DuplicateKey)
{
_logger.LogWarning(“Race condition on upsert for blob {Path}. Retrying as read-only lookup.”, locator.ToString());
blobInfo = await _blobCollection.Find(insertFilter).FirstOrDefaultAsync(cancellationToken);
}
// JULIAN_DIVERGENCE_END

_logger.LogDebug(“Created blob {BlobId} at {Path} ({NumImports} imports)”, blobInfo.Id, blobInfo.Path, imports.Count);
return blobInfo;
}`Since you have a very consistent occurrence of this, if we see nominal behaviour on your end this could be a potential fix we could adopt on our end. The aim of the above code is more or less to try and add, if not, we perform a find for the given filter and return the object.

Kind regards,

Julian

Hmm, I wonder if the exception was too narrow in the catch line, because I don’t see any such log emitted “race condition on upsert…”. Perhaps just try A general catch (Exception ex) for now to see if this at least gets us handling it.

Julian

Hey Abhishek,

Interesting. So I am going to assume this is the control flow where we hit a supposed duplicate, and then attempt to grab it instead. I think we need to widen the filter:

`internal async Task AddBlobAsync(NamespaceId namespaceId, BlobLocator locator, IReadOnlyCollection imports, List? exports = null, CancellationToken cancellationToken = default)
{
List importIds = await FindOrAddShadowBlobsAsync(namespaceId, imports, cancellationToken);

FilterDefinition insertFilter =
Builders.Filter.Eq(x => x.NamespaceId, namespaceId) &
Builders.Filter.Eq(x => x.Path, locator.ToString()) &
Builders.Filter.Eq(x => x.Shadow, true);

UpdateDefinition insertUpdate = Builders.Update
.Unset(x => x.Shadow);

if (importIds.Count > 0)
{
insertUpdate = insertUpdate.Set(x => x.Imports, importIds);
}
if (exports != null && exports.Count > 0)
{
insertUpdate = insertUpdate.Set(x => x.Aliases, exports);
}

// JULIAN_DIVERGENCE_START
BlobInfo? blobInfo;
try
{
blobInfo = await _blobCollection.FindOneAndUpdateAsync(insertFilter, insertUpdate, new FindOneAndUpdateOptions<BlobInfo, BlobInfo> { IsUpsert = true, ReturnDocument = ReturnDocument.After }, cancellationToken);
}
catch (Exception ex)
{
FilterDefinition existingFilter =
Builders.Filter.Eq(x => x.NamespaceId, namespaceId) &
Builders.Filter.Eq(x => x.Path, locator.ToString());

_logger.LogWarning(“Race condition on upsert for blob {Path}. Retrying as read-only lookup.”, locator.ToString());

blobInfo = await _blobCollection.Find(existingFilter).FirstOrDefaultAsync(cancellationToken);

if(blobInfo == null)
{
_logger.LogWarning(“We’ve hit the exception {Exception}, but have failed to obtain the duplicate entry.”, ex.Message);
}
}

_logger.LogDebug(“Created blob {BlobId} at {Path} ({NumImports} imports)”, blobInfo?.Id, blobInfo?.Path, imports?.Count);
// JULIAN_DIVERGENCE_END

return blobInfo;
}`I appreciate the collaboration on this by the way. And for posterity, the keys are constructed JIT when we initially setup the collection (StorageService.cs):

List<MongoIndex<BlobInfo>> blobIndexes = new List<MongoIndex<BlobInfo>>(); blobIndexes.Add(keys => keys.Ascending(x => x.Imports)); blobIndexes.Add(keys => keys.Ascending(x => x.NamespaceId).Ascending(s_blobAliasField)); blobIndexes.Add(keys => keys.Ascending(x => x.NamespaceId).Ascending(x => x.Path), unique: true); _blobCollection = mongoService.GetCollection<BlobInfo>("Storage.Blobs", blobIndexes);

Kind regards,

Julian

Hey Abhishek,

No worries at all - happy to help. I’ll schedule a fix on our end to get this in. I’m still genuinely curious why this is happening on your setup with such frequency, as we don’t see this with such predictability internally. Ideally I narrow the exception a bit more as well in order to prevent this fallback behaviour on too many other cases.

Kind regards,

Julian

Hi Julian,

Thanks for looking into this.

It fails 100%, I meant intermittent in the sense that it doesn’t predictably fail at the same point when uploading artifacts.

I did a fresh run I.e empty / new mongodb and the issue continues to occur.

Uploading a full log for reference.

Abhishek Sagi

Log

I tried your suggested change with exception handling added. However, results in same issue.

Good shout, I tried it with broader exception handling and this is what I landed on :

Error writing step-output/main/871/compile-editor/684294893c70b729afedb759/b2ea35a9de7e49799893a92c6132c5f7_2: PUT to https://awesomegames.ddns.net:13343/api/v1/storage/horde-artifacts/blobs/step-output/main/871/compile-editor/684294893c70b729afedb759/b2ea35a9de7e49799893a92c6132c5f7_2 failed (InternalServerError). Response: {"time":"2025-06-06T07:23:42","level":"Error","message":"Unhandled exception: Object reference not set to an instance of an object.","format":"Unhandled exception: {Message}","properties":{"Message":"Object reference not set to an instance of an object."},"exception":{"message":"Object reference not set to an instance of an object.","trace":" at HordeServer.Storage.StorageService.AddBlobAsync(NamespaceId namespaceId, BlobLocator locator, IReadOnlyCollection1 imports, List1 exports, CancellationToken cancellationToken) in /app/Source/Programs/Horde/Plugins/Storage/HordeServer.Storage/Storage/StorageService.cs:line 822\n at HordeServer.Storage.StorageService.StorageBackendImpl.WriteBlobAsync(BlobLocator locator, Stream stream, IReadOnlyCollection1 imports, CancellationToken cancellationToken) in /app/Source/Programs/Horde/Plugins/Storage/HordeServer.Storage/Storage/StorageService.cs:line 139\n at HordeServer.Storage.StorageController.WriteBlobAsync(IStorageBackend storageBackend, Nullable1 locator, WriteBlobRequest request, CancellationToken cancellationToken) in /app/Source/Programs/Horde/Plugins/Storage/HordeServer.Storage/Storage/StorageController.cs:line 211\n at HordeServer.Storage.StorageController.WriteBlobAsync(NamespaceId namespaceId, BlobLocator locator, WriteBlobRequest request, CancellationToken cancellationToken) in /app/Source/Programs/Horde/Plugins/Storage/HordeServer.Storage/Storage/StorageController.cs:line 148\n at lambda_method1023(Closure, Object)\n at Microsoft.AspNetCore.Mvc.Infrastructure.ActionMethodExecutor.AwaitableObjectResultExecutor.Execute(ActionContext actionContext, IActionResultTypeMapper mapper, ObjectMethodExecutor executor, Object controller, Object[] arguments)\n at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.<InvokeActionMethodAsync>g__Awaited|12_0(ControllerActionInvoker invoker, ValueTask1 actionResultValueTask)\n at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.g__Awaited|10_0(ControllerActionInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)\n at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.Rethrow(ActionExecutedContextSealed context)\n at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.Next(State& next, Scope& scope, Object& state, Boolean& isCompleted)\n at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.g__Awaited|13_0(ControllerActionInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)\n at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.g__Awaited|25_0(ResourceInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)\n at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.Rethrow(ResourceExecutedContextSealed context)\n at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.Next(State& next, Scope& scope, Object& state, Boolean& isCompleted)\n at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.g__Awaited|20_0(ResourceInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)\n at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.g__Awaited|17_0(ResourceInvoker invoker, Task task, IDisposable scope)\n at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.g__Awaited|17_0(ResourceInvoker invoker, Task task, IDisposable scope)\n at Microsoft.AspNetCore.Authorization.AuthorizationMiddleware.Invoke(HttpContext context)\n at Microsoft.AspNetCore.Authentication.AuthenticationMiddleware.Invoke(HttpContext context)\n at Microsoft.AspNetCore.Diagnostics.ExceptionHandlerMiddlewareImpl.g__Awaited|10_0(ExceptionHandlerMiddlewareImpl middleware, HttpContext context, Task task)",“innerExceptions”:}}`Line 822 :

_logger.LogDebug(“Created blob {BlobId} at {Path} ({NumImports} imports)”, blobInfo.Id, blobInfo.Path, imports.Count);

Hi Julian,

Thank you so much for prompt responses and helping me troubleshoot root cause to address this issue.

What a great way to end the week. This latest change fixes the issue! and have first green build since 5.6 upgrade.

Have a great weekend. Would be good to get this change integrated into hotfix/mainline so rest of the licensees won’t encounter similar problems.

Kind Regards,

Abhishek