Zenstore array index out of range - fixes beyond 37590157 ?

See the main entry for details, etc. As this issue is much better with Epic CL 37590157, this is likely related to unsynchronized multithread access.

The screenshot is the point of failure in code with that Epic CL integrated.

No repro steps other than to cook. The commandline from the log file is

Command Line: C:\p4\main\game\AE\AE.uproject -run=Cook -TargetPlatform=Windows -zenstore -unversioned -iterate -skipeditorcontent -ZenStore -fileopenlog -abslog=C:\p4\main\game\Engine\Programs\AutomationTool\Saved\Cook-2025.06.09-16.56.55.txt -stdout -CrashForUAT -unattended -NoLogTimes -UTF8Output

Steps to Reproduce
We are using UE5.5.4 from Epic Perforce, but have a number of changes from us and some cherry-picked changes from Epic. When cooking (multiprocess, zenstore), we have seen occasional index-out-of-range failed check() in ZenStoreWriter.cpp.

These seem to be transient, in that retrying after a failed cook, it has always succeeded.

The message is always of this form:

Assertion failed: (Index >= 0) & (Index < ArrayNum) [File:D:\build\streams\main\game\Engine\Source\Runtime\Core\Public\Containers\Array.h] [Line: 783]

Array index out of bounds: 85269 into an array of size 85239

Note: we have added that logging of the desired array index and actual size. It is always an index just past the array size.

In looking at Epic’s depot, change 37590157 looked relevant, and we cherrypicked it. Its description is

IncrementalCook: Fix crash due to unsynchronized multithreaded access to CookedPackagesInfo. It is written to from async thread in CommitPackageInternal and is read from in main thread by CookRequestCluster when cooking incrementally.

With that change in our code, the crashes were vastly mitigated, but not eliminated. We were wondering if there might be other CLs others have identified as relevant. Thanks.

There were not any further changes after 37590157 that affected the multithread synchronization of CookedPackagesInfo.

Can you post your version of ZenStoreWriter.cpp.

Running ZenStore with -iterate is not something we have significantly tested. -iterate uses legacy iterative mode and we have been testing with the new incrementalcook mode that is experimental in 5.6. I didn’t think that 37590157 would have an impact if not using the new incremental cook, but possibly there is some access from legacy iterative code that is triggering the unsynchronized access.

In the 5.5.4 version of ZenStoreWriter.cpp, without 37590157, CookedPackagesInfo is extended to be big enough to hold the index in CommitEventArgs.EntryIndex on line 808. That line expects that it has the same length as PackageStoreEntries, but that is not explicitly checked. Can you add a check statement after the two emplace calls that the TArrays are the same length and EntryIndex is a valid index for them?

`if (CommitEventArgs.EntryIndex == PackageStoreEntries.Num())
{
PackageStoreEntries.Emplace();
CookedPackagesInfo.Emplace();

// BEGINNEWCODE
check(PackageStoreEntries.Num() == CookedPackagesInfo.Num() && CookedPackagesInfo.Num() > CommitEventArgs.EntryIndex);
// ENDNEWCODE
}`

Thanks for responding so quickly. Attached is a zip of the ZenStoreWriter.cpp we’re using. We’ve been using Zenstore since 5.3 and it’s a huge win in most cases. If -iterate is untested, and incrementalcook is experimental in 5.6, what’s the suggested approach to use in the meantime? We’re not able to jump to 5.6 yet and are cautious about rolling out experimental features to the farm.

I’ll try adding those proposed check()s and see what my box gets; I would probably reduce that to an ensure() instead when it’s running on our build farm and content creator’s boxes.

-iterate without -zenstore is tested by many licensees (but not internally at Epic).

Probably -iterate works fine with zenstore as well; your report is the first I’ve heard of any possible problem with zenstore and -iterate.

Ah ha.

If RemoveCookedPackages is called from GameThread, from

FZenStoreWriter::RemoveCookedPackages(...) UCookOnTheFlyServer::DeleteOutputForPackage(...) FCookGenerationInfo::LegacyIterativeCookValidateOrClear(...)In between the assignment on the commit thread of CommitEventArgs.EntryIndex on line 818 of your ZenStoreWriter.cpp, and the use of that stack-local variable on line 1003 of your file, CommitEventArgs.EntryIndex is now an out of date index into an array that has had elements removed.

This problem does not occur in 5.6 because of CL 38318389, which moved the allocation and use inside of a single critical section enter/exit, with the changedescription comment line:

> Move use of critical section in FZenStoreWriter::CommitInternal so that all writes occur in a single critical section block, to reduce contention.

So there WAS another change that unintentionally fixed it.

I see three options to fix the crash in 5.5:

  1. Cherrypick all of 38318389.
    1. Not recommended because it’s large and mostly unrelated, and likely has dependencies on other changes that you would have to merge around.
  2. Using 38318389’s changes to FZenStoreWriter::CommitPackageInternal as inspiration, ignore the portions that allow writing in the CommitInfo.Status != ECommitStatus::Success case, and take only the portions that “Move use of critical section in FZenStoreWriter::CommitInternal so that all writes occur in a single critical section block, to reduce contention.”
    1. This will probably be straight-forward, but as with any change its implementation could have bugs.
  3. Modify line 1003 of your file to recompute CommitEventArgs.EntryIndex after entering the critical section:

{ FWriteScopeLock _(EntriesLock); // New Code int32* EntryIndexPtr = PackageNameToIndex.Find(CommitInfo.PackageName); if (!EntryIndexPtr) { // Should not occur, panic } CommitEventArgs.EntryIndex = *EntryIndexPtr; // End New Code FOplogCookInfo& CookInfo = CookedPackagesInfo[CommitEventArgs.EntryIndex]; CookInfo.bUpToDate = true; CookInfo.Attachments = MoveTemp(CookInfoAttachments); }Option 3 should work fine with negligible performance cost. I recommend going with that one.

Thanks for the feedback. We went with option 3, and inserted a check() on the line saying “should not occur, panic”. (See attached zip)

That change greatly reduced the crashes -- we’re getting now 1-2 crashes per week across all boxes on the farm, cooks at developer desks, etc. It had been 20-30 per day before integrating 37590157, 10-15 per day with 37590157.

Unfortunately, that check() still occasionally fires -- that 1-2 crashes per week. I haven’t been able to catch this under the debugger. All I can tell offhand is that the EntryIndexPtr is nullptr. The callstack is the same as before; it may have been in garbage collection at the time.

13:08:43,400 INFO - LogWindows: Error: [CookWorker 1]: appError called: Assertion failed: EntryIndexPtr [File:D:\build\streams\main\game\Engine\Source\Developer\IoStoreUtilities\Private\ZenStoreWriter.cpp] [Line: 1063] 13:08:43,400 INFO - LogThreadingWindows: Error: [CookWorker 1]: Runnable thread TAsyncThread 0 crashed. 13:08:43,403 INFO - LogWindows: Error: [CookWorker 1]: begin: stack for UAT 13:08:43,403 INFO - LogWindows: Error: [CookWorker 1]: === Critical error: === 13:08:43,403 INFO - LogWindows: Error: [CookWorker 1]: 13:08:43,403 INFO - LogWindows: Error: [CookWorker 1]: Assertion failed: EntryIndexPtr [File:D:\build\streams\main\game\Engine\Source\Developer\IoStoreUtilities\Private\ZenStoreWriter.cpp] [Line: 1063] 13:08:43,403 INFO - LogWindows: Error: [CookWorker 1]: 13:08:43,403 INFO - LogWindows: Error: [CookWorker 1]: 13:08:43,403 INFO - LogWindows: Error: [CookWorker 1]: 13:08:43,403 INFO - LogWindows: Error: [CookWorker 1]: [Callstack] 0x00007ffa6212dddd UnrealEditor-IoStoreUtilities.dll!FZenStoreWriter::CommitPackageInternal() [D:\build\streams\main\game\Engine\Source\Developer\IoStoreUtilities\Private\ZenStoreWriter.cpp:1063] 13:08:43,403 INFO - LogWindows: Error: [CookWorker 1]: [Callstack] 0x00007ffa62108c61 UnrealEditor-IoStoreUtilities.dll!FZenStoreWriter::BeginCook’::33'::<lambda_1>::operator()() [D:\build\streams\main\game\Engine\Source\Developer\IoStoreUtilities\Private\ZenStoreWriter.cpp:656] 13:08:43,403 INFO - LogWindows: Error: [CookWorker 1]: [Callstack] 0x00007ffa621a7bfc UnrealEditor-IoStoreUtilities.dll!TAsyncRunnable<void>::Run() [D:\build\streams\main\game\Engine\Source\Runtime\Core\Public\Async\Async.h:457] 13:08:43,403 INFO - LogWindows: Error: [CookWorker 1]: [Callstack] 0x00007ffa741953bd UnrealEditor-Core.dll!FRunnableThreadWin::Run() [D:\build\streams\main\game\Engine\Source\Runtime\Core\Private\Windows\WindowsRunnableThread.cpp:159] 13:08:43,403 INFO - LogWindows: Error: [CookWorker 1]: [Callstack] 0x00007ffa74186e23 UnrealEditor-Core.dll!FRunnableThreadWin::GuardedRun() [D:\build\streams\main\game\Engine\Source\Runtime\Core\Private\Windows\WindowsRunnableThread.cpp:79] 13:08:43,403 INFO - LogWindows: Error: [CookWorker 1]: [Callstack] 0x00007ffaa888e8d7 KERNEL32.DLL!UnknownFunction [] 13:08:43,403 INFO - LogWindows: Error: [CookWorker 1]: 13:08:43,403 INFO - LogWindows: Error: [CookWorker 1]: Crash in runnable thread TAsyncThread 0 13:08:43,403 INFO - LogWindows: Error: [CookWorker 1]: end: stack for UAT 13:08:47,761 INFO - LogCook: Warning: CookWorkerCrash: CookWorker 1 failed to read from socket with description: Connection terminated. we will shutdown the remote process. Assigned packages will be returned to the director.

Okay, I looked again and that makes sense for the same flow. Further, it is due to a bug that still exists in head. The bug does not cause a problem in head because we’re not doing the second lookup because we have 38318389 instead, but the data is incorrect, just never read.

RemoveCookedPackages recreates PackageNameToIndex, getting the PackageName to use as the key in PackageNameToIndex from CookedPackagesInfo[N]. When cooking iteratively, for packages that were previously cooked, CookedPackagesInfo[N].PackageName is set on line 538 when constructing the CookedPackagesInfo from the downloaded Oplog, so those packages are readded to PackageNameToIndex correctly. But for new packages discovered in the current cook, when we add a new element to CookedPackagesInfo on line 829, we do not set the PackageName on it. And we don’t set it down below on lline 1068 either.

Change line 829 to store the PackageName in the new CookPackageInfo:

if (CommitEventArgs.EntryIndex == PackageStoreEntries.Num()) { PackageStoreEntries.Emplace(); // New Code // CookedPackagesInfo.Emplace(); CookedPackagesInfo.Emplace(CommitInfo.PackageName); // End New Code // BEGIN ARCHETYPE CHANGE (DVT-1134) // DVT: https://archetype-games.atlassian.net/browse/DVT-1134 // Extra new code in his block beyond the preport of 37590157. Suggested by Epic at // [Content removed] check(PackageStoreEntries.Num() == CookedPackagesInfo.Num() && CookedPackagesInfo.Num() > CommitEventArgs.EntryIndex); // END ARCHETYPE CHANGE (DVT-1134) }

FYI - the latest revision seems to be doing well, but has only just made it into full production. We’re slow to roll out changes beyond initial canary builds, etc.