Multuprocesscook write file conflicts

Hi, we have encounted an cooking error when enable multiproce​ss cooking:

[Image Removed]

This is caused by a wwise audiobank data(shared.bnk), which is referenced by multiple uasset such audioevents, and those audio events are divided into different cook workers. While cooking those audio events, it will add the referenced audiobank to addtionalfiletocook list, which causing the same audiobank be writen to cooked folder mutiple times by differents cook processes. And it lead to write conflicts(in ) with a certain probability in the function WriteToFile[Image Removed]

I have two questions here:

1. is these scenario will be arised even in none-wwise uasset when using LoadBalanceStriped with MultiProcessCook? It seems that in LoadBalanceStriped mode, package will divided into different cook processors without considering dependencies, thus would cause same uasset cook and save duplicately, which lead to write conflicts issue? For instance, both assetA and assetB hard depend assetC, and assetA is in cookworker0, assetB is in cookworker1, causing assetC loaded and saved twise.

2. Is there any way to avoid this? In wwise case, should the multi referenced audioasset like shared.bnk bind to specific cook processor such as director? Does unreal have this mechanism?

1.

This will not occur with packages. Each package is saved by only a single CookWorker. When a CookWorker receives an assignment to cook a package, it does not cook all of the packages referenced from that package, it cooks only the single package assigned. When it discovers dependencies, it reports those dependencies to the CookDirector. The CookDirector ignores the report if it already knows about the dependencies, and otherwise assigns each dependency to a single CookWorker to cook.

2.

I haven’t heard of this problem before, but I have heard of Wwise’s use of CookAdditionalFilesOverride and the possibility of a single AudioBank being referenced by multiple packages, so I understand why the problem is occurring. Since I haven’t heard of it before, but MPCook has been recommended since 5.4, it seems likely that the Wwise team has worked around this already; you should reach out to them to find out whether they have encountered this problem and worked around it.

If there is no workaround from Wwise, I think the easiest solution is to modify cooker code to add a NamedMutex around the write of AdditionalFiles. But a better solution for the long term is to replicate the AdditionalFiles back to the director and have it write them all out; that will make the system still able to work if we eventually move it to support cooking on multiple machines. In either case, it will require some changes to the PackageWriter API that the cooker uses to receive results from SavePackage. I will try to create that change in the next few days for you to try out.

Status update:

Roger on your reports that the problem still occurs at head Wwise code and head UE5 release. Adding some synchronization between CookDirector and CookWorkers is then my definite recommendation. I thought I would start working on that earlier and hand off a prototype by now, but I was unable to pull free of other priorities until now. I am starting on the prototype now and hope to have a shelf you can try before EoD next Monday.

I needed 3 submits to implement the fix in head code. They are

42969478 aka github commit https://github.com/EpicGames/UnrealEngine/commit/8fe5f2cccce429d9c765d60be5ed1e769089ea0e

Cooker: Provide a single CookerInterface back pointer to the CookedPackageWriters…

42971592 aka github commit https://github.com/EpicGames/UnrealEngine/commit/32ffb1be89f5b801b06de81fbf9f6e150d4f7f3b

Cooker: PackageWriter: Add FBasePackageWriter and FBaseCookePackageWriter classes in CoreUObject to provide shared implementation for ZenStoreWriter and FLooseCookedPackageWriter…

43029482 aka github commit https://github.com/EpicGames/UnrealEngine/commit/19ec3c54e7e5da8b972382a932■■■9c3b9ce6315

Cooker: PackageWriter: Add UE::PackageWriter::ICookerInterface::WriteFileOnCookDirector…

The first two changes contain many lines that you will not need; they are deprecation handling and improvement to interfaces. They also will not merge well into 5.5 and earlier. All 3 should merge without difficulty into 5.6.

For 5.5 and earlier, I have created a shelf that contains the minimally required portion of the change integrated into 5.5. You should use this shelf if on 5.5. If on an earlier release, you will need to merge it in, but I expect that to be relatively easy merging. Let me know if it is not.

The shelf is attached as a gnu patch file. It was created against CL 41545209 in Release-5.5, aka github commit https://github.com/EpicGames/UnrealEngine/commit/ef1397773d160d39423feb90cb2196ddfaa1e2ae

hi,i finished merging this patch.There is an check failed, but very easy to workaround.

[Image Removed]After some modification like this:

[Image Removed]it seems worked in mutiprocess cook.

By the way, accoding to your code, it seems each worker would notify server to move the same temp file(such as shared.bnk under MPCookTransfer/) to the target path? so way not exclude those file which is already moved in the server?

Thanks matt~

I thought the patch had the change to call HasSessionPlatform from GetCookOutputFolder; was that the change you needed to add? Maybe it got missed by the merge somehow.

The reason why I moved the temp file even if the target file already exists is that I wanted to preserve the behavior from single process cook. That’s not very important to preserve because the behavior is just a performance change if the files are identical and is a non-deterministic behavior change if the files are different, and we could arbitrarily chose to have the non-determinism rules different in MPCook. But I wanted to keep it the same for simplicity.

I have a plan in the future to add keys to identify the contents of the file and shared Override files between packages, so that they only get written once and we can issue an error if they have different contents. I have discussed those plans with some programmers from AudioKinetic, but I haven’t been able to implement them yet.

When I do make that change, it will change the functionality of this current change, which is a more limited workaround.

An MPCook Worker will fail to respond to a retraction message if it is doing a long load package. I’m surprised that it would run for thousands of seconds though. Possibly there is another case that would cause it to run long; any blocking activity on the game thread will prevent it from responding to the CookDirector’s messages.

A single MPCook Worker runs more slowly than an SPCook process does, because we restrict the number of threads it uses. We do that to prevent oversubscription and thrashing with the other CookWorkers. The main use of threads during cooking is to handle the build of Derived Data, and most Derived Data Builds are cached in the Derived Data Cache. Do you have a persistent DDC that you are using on the machine doing the cook? A persistent local DDC will be used by default, but maybe you are running the cook on a farm machine and you are deleting the local DDC directory in between runs. The local DDC directory by default is stored in <root>\Engine\DerivedDataCache.

The logs from the CookWorker contain some coarse profile data, in various sections

LogCookStats: Display: Hierarchy Timer Information: ... LogCookStats: Display: Cook Profile ... LogCookStats: Display: DDC Summary Stats ... LogCookStats: Display: DDC Resource StatsCan you post those sections from the CookWorker’s log?

The log files by default are written to <ProjectRoot>\Saved\Logs, but if you are running the build through the editor or automationtool, they will be written instead to <UERoot>\Engine\Programs\AutomationTool\Saved

If the logs are insufficient to find out what is taking the cookworker so long, then you can gather more data, from the UnrealInsights trace (run the cooker with -trace=default -tracefile=<OutputFIle>), or by attaching with a third party profiler such as Event Tracing for Windows, or by attaching with a debugger and hitting debug-break.

The DDC stats report that there were mostly hits: LogCookStats: Display: TotalGetHitPct = 0.985889

So my theory that this is caused by uncached derived data builds is wrong.

There’s not enough information in the coarse profiling data provided by the log to show where the time is going. Most of the time is spent in LoadPackageForCooking, but I don’t have any theories for why that would take longer on a CookWorker with reduced threads than it takes on a SPCook process with full threads.

Can you capture a trace or profile of the CookWorker that is running slow?

I can add 2 more things:

  1. This does not happen every cook
  2. It does not seem to be random. This happens more frequently the closer to max RAM usage of the system we are. We are watching the usage, adjusted core number and set up the config entries, but it seems like we need around 12 - 16 GB free RAM for the issue to not show up. We’re cooking on 128GB RAM machine, we’ll be upgrading it because the gains are insane once correctly set up. We shaved off 1h15 - 1h20min off of a 1h40min cook. So the issue goes away when we leave some room for breathing.

My best guess so far is that .NET Host needs some RAM and if we use most of it for cooking (so we set up the values to stay at 90-95% usage of RAM) then it just stops working correctly or in reasonable time. But I did not have time to debug it further. I’ll try to gather the insights file for you, just so the feature has some data to develop onto, in case this can be worked with on your side.

Below is a screen from a 4-core cook on my dev machine (not build farm). There’s no Google Chrome / Slack on build machine, so the memory targets are closer to max. As you may see there are 2 issues still to work out:

  1. That .NET Host scales with the amount of work the cooker does. It consistently eats up more RAM the more work happens. As the build farm is set up to use 9 cores, the usage goes higher, which is what I base my guess on, that this is the process that needs memory to not block cooking. Something from the cooker uses that .NET host, because it goes away completely when the cook ends.
  2. Microsoft Defender attempts to catch up with the files loaded from drive by all the cook processes. The more processes I use the more CPU power is taken by Defender. This slows cooking. I had major gains when I killed defender, but that’s far from optimal.

[Image Removed]

Ah ha, with that explanation I now understand why this is happening. MPCook uses more memory than SPCook. When the cooker detects a high memory state (via QueryMemoryResourceNotification on Windows), it triggers garbage collection, with a minimum period in between calls to garbage collection. Garbage collection has a small cost on its own, but it also means we throw out the packages we have loaded and we then have to load them again, so the load time spent by each CookWorker becomes higher. This matches up with the long time reported for Package Loads in the profile data from your log.

We have made improvements to MPCook’s use of garbage collection in 5.5 and 5.6; updating to those versions could improve your practical performance.

Otherwise, the primary recommendation I have is the one you have already mentioned: increase memory available on the machine.

Since I’m literally on the same problem right now, with 5.5 and the newest Wwise plugin I’m afraid there is no workaround. From their side, we should use bPackageAsBulkData set to true and it works. Well, it doesn’t. From what I know some projects work, while others don’t. Just for reference, there are 2 paths this happens:

One is write when other processes hold the file read-open:

`LogWwiseFileHandler: Display: Wwise: Adding file 164054554A4456D7B62DD3B885FC3F0A_en.wem [36332 bytes] from 164054554A4456D7B62DD3B885FC3F0A_en.wem

LogWwiseFileHandler: Display: [CookWorker 0]: Wwise: Adding file 164054554A4456D7B62DD3B885FC3F0A_en.wem [36332 bytes] from 164054554A4456D7B62DD3B885FC3F0A_en.wem

LogWwiseFileHandler: Display: [CookWorker 1]: Wwise: Adding file 164054554A4456D7B62DD3B885FC3F0A_en.wem [36332 bytes] from 164054554A4456D7B62DD3B885FC3F0A_en.wem

LogWwiseFileHandler: Display: [CookWorker 2]: Wwise: Adding file 164054554A4456D7B62DD3B885FC3F0A_en.wem [36332 bytes] from 164054554A4456D7B62DD3B885FC3F0A_en.wem

LogWindows: Error: [CookWorker 2]: SavePackage Async write Saved/Cooked/Windows/Content/164054554A4456D7B62DD3B885FC3F0A_en.wem failed: The process cannot access the file because it is being used by another process.`

And two is an attempt to delete the file held by other process (error code 32 means other process uses the file):

`LogWwiseFileHandler: Display: Wwise: Adding file 002897CF431531EA6393A49F74D03C5F_en.wem [40753 bytes] from 002897CF431531EA6393A49F74D03C5F_en.wem

LogWwiseFileHandler: Display: [CookWorker 0]: Wwise: Adding file 002897CF431531EA6393A49F74D03C5F_en.wem [40753 bytes] from 002897CF431531EA6393A49F74D03C5F_en.wem

LogWwiseFileHandler: Display: [CookWorker 1]: Wwise: Adding file 002897CF431531EA6393A49F74D03C5F_en.wem [40753 bytes] from 002897CF431531EA6393A49F74D03C5F_en.wem

LogWwiseFileHandler: Display: [CookWorker 2]: Wwise: Adding file 002897CF431531EA6393A49F74D03C5F_en.wem [40753 bytes] from 002897CF431531EA6393A49F74D03C5F_en.wem

LogFileManager: Warning: [CookWorker 2]: Error deleting file: Saved/Cooked/Windows/Content/002897CF431531EA6393A49F74D03C5F_en.wem (Error Code 32)`

This is the callstack:

`[CookWorker 1]: Runnable thread Background Worker #1 crashed.
[CookWorker 1]: begin: stack for UAT
[CookWorker 1]: === Critical error: ===
[CookWorker 1]: Fatal error: [File:Engine\Source\Editor\UnrealEd\Private\Cooker\LooseCookedPackageWriter.cpp] [Line: 451]
[CookWorker 1]: SavePackage Async write Saved/Cooked/Windows/Content/579868A34425CECF8F0574B90A134D37_en.wem failed: Proces nie może uzyskać dostępu do pliku, ponieważ jest on używany przez inny proces.

[Callstack] 0x00007ffdc68d34bb UnrealEditor-UnrealEd.dll!WriteToFile()
[Callstack] 0x00007ffdc688103d UnrealEditor-UnrealEd.dll!FLooseCookedPackageWriter::FWriteFileData::HashAndWrite()
[Callstack] 0x00007ffdc6823170 UnrealEditor-UnrealEd.dll!FLooseCookedPackageWriter::AsyncSaveOutputFiles'::2’::<lambda_1>::operator()()
[Callstack] 0x00007ffdc6866255 UnrealEditor-UnrealEd.dll!UE::Tasks::Private::TExecutableTaskBase<FLooseCookedPackageWriter::AsyncSaveOutputFiles'::2’::<lambda_1>,void,void>::ExecuteTask()
[Callstack] 0x00007ffdc6425b40 UnrealEditor-UnrealEd.dll!UE::Tasks::Private::FTaskBase::TryExecuteTask()
[Callstack] 0x00007ffdc63dc2e4 UnrealEditor-UnrealEd.dll!LowLevelTasks::TTaskDelegate<LowLevelTasks::FTask * __cdecl(bool),48>::TTaskDelegateImpl<LowLevelTasks::FTask::Init<UE::Tasks::Private::FTaskBase::Init’::2'::<lambda_1> >'::13’::<lambda_1>,0>::CallAndMove()
[Callstack] 0x00007ffdcf380115 UnrealEditor-Core.dll!LowLevelTasks::FTask::ExecuteTask()
[Callstack] 0x00007ffdcf37ffde UnrealEditor-Core.dll!LowLevelTasks::FScheduler::ExecuteTask()
[Callstack] 0x00007ffdcf3a6aad UnrealEditor-Core.dll!LowLevelTasks::FScheduler::WorkerLoop()
[Callstack] 0x00007ffdcf36d0c6 UnrealEditor-Core.dll!LowLevelTasks::FScheduler::CreateWorker'::2’::<lambda_1>::operator()()
[Callstack] 0x00007ffdcf577f03 UnrealEditor-Core.dll!FThreadImpl::Run()
[Callstack] 0x00007ffdcf9ca78d UnrealEditor-Core.dll!FRunnableThreadWin::Run()
[Callstack] 0x00007ffdcf9c1baf UnrealEditor-Core.dll!FRunnableThreadWin::GuardedRun()
[Callstack] 0x00007ffec738e8d7 KERNEL32.DLL!UnknownFunction

[CookWorker 1]: Crash in runnable thread Background Worker #1
[CookWorker 1]: end: stack for UAT`

If you do make those changes, I will gladly test them out as well, giving a bit bigger sample pool to verify. Alternatively if that’s not likely to happen due to your other priorities or will take more time, just hint where and how it should be done, I guess I could work from there.

We actually have upgrade wwise to the latest already, and also enable ​bPackageAsBulkData as the official recommended.

Same with us. And we have upgrade wwise to the latest , and also enable ​bPackageAsBulkData as the official recommended.

greate, waiting for your work. we currently workaround this by using random sleep when find writing file conflicts.​

thanks very much, i will try it.

The patch does contain the change from:

if (!PlatformManager->GetSessionPlatforms().Contains(TargetPlatform))to:

if (!PlatformManager->HasSessionPlatform(TargetPlatform))

I did not have that problem, I’ve even verified this in the patch file in text editor. I guess that Juhne Tong had some issues with the patch application phase. Also, I do not have a function named:

GetPlatformSandboxDirectoryanywhere in the solution, either pre or post change. I’m guessing it’s an older UE version. For me the line is as follows:

FString Result = SandboxFile->GetSandboxDirectory(TargetPlatform->PlatformName());The code compiles, I’m testing it. Give me a few days, several cooks and I’ll let you know if the issue is gone for good.

Hey,

The code seems to generally work. The cook succeeds I’ve tried numerous times. I am aware of the memory and core requirements, I’ve read the page: https://dev.epicgames.com/documentation/en\-us/unreal\-engine/using\-multi\-process\-cooking\-for\-unreal\-engine?application\_version\=5\.5

My case is WP level, so I also know that one of the processes will most probably do most of the work, because WP work cannot be shared. However that one process often stalls with this message:

LogCook: Display: CookWorker 0 has not responded to a RetractionRequest message for 13375.6 seconds. Continuing to wait...That time goes to 12 - 13k seconds and then the CookWorker 0 actually responds, the cooking continues. If left long enough the process will finish but it takes much longer than single threaded cook. This cycle repeats several times during the cook. I’m pasting relevant parts of the log below. Do you think that one asset failing is causing this issue?

`[21:07:09] : [Step 8/13] LogWwiseFileHandler: Display: [CookWorker 0]: Wwise: Adding file FF8E140F4A5DF57D6AB58885F72FD2E3_pl.wem [34702 bytes] from FF8E140F4A5DF57D6AB58885F72FD2E3_pl.wem
[21:07:09] : [Step 8/13] LogWwiseFileHandler: Display: [CookWorker 0]: Wwise: Adding file FFC7862C45A0C6C120FF778AC462197D_pl.wem [13403 bytes] from FFC7862C45A0C6C120FF778AC462197D_pl.wem
[21:07:09] : [Step 8/13] LogCook: Display: [CookWorker 0]: Cooked packages 43 Packages Remain 26845 Total 26888
[21:07:09] : [Step 8/13] LogCook: Display: [CookWorker 0]: Cook Diagnostics: OpenFileHandles=38582, VirtualMemory=10552MiB, VirtualMemoryAvailable=72798MiB


then nothing about cook worker 0 for a long time

[21:38:36] : [Step 8/13] LogCook: Display: Cooked packages 69340 Packages Remain 57526 Total 126866
[21:38:36] : [Step 8/13] LogCook: Display: Idle CookWorkers: { CookWorker 1 }. Retracting 14013 packages from CookWorker 0 to distribute to the idle CookWorkers.
[21:38:36] : [Step 8/13] LogCook: Display: CookWorker Local: 3194 packages remain.
[21:38:36] : [Step 8/13] LogCook: Display: CookWorker 0: 28026 packages remain.
[21:38:36] : [Step 8/13] LogCook: Display: CookWorker 1: 0 packages remain.
[21:38:36] : [Step 8/13] LogCook: Display: CookWorker 2: 26306 packages remain.


[21:39:37] : [Step 8/13] LogCook: Display: CookWorker 0 has not responded to a RetractionRequest message for 61.0 seconds. Continuing to wait…

a lot of time passed

[01:21:36] : [Step 8/13] LogCook: Display: CookWorker 0 has not responded to a RetractionRequest message for 13375.6 seconds. Continuing to wait…
[01:21:39] : [Step 8/13] LogCook: Display: Cooked packages 98833 Packages Remain 28033 Total 126866
[01:21:40] : [Step 8/13] LogCook: Display: [CookWorker 2]: Cook Diagnostics: OpenFileHandles=5126, VirtualMemory=22439MiB, VirtualMemoryAvailable=20848MiB
[01:21:41] : [Step 8/13] LogCook: Display: [CookWorker 1]: Cook Diagnostics: OpenFileHandles=23360, VirtualMemory=25802MiB, VirtualMemoryAvailable=20848MiB
[01:21:41] : [Step 8/13] LogCook: Display: Cook Diagnostics: OpenFileHandles=15751, VirtualMemory=47939MiB, VirtualMemoryAvailable=20848MiB
[01:21:41] : [Step 8/13] LogCook: Display: CookWorker Local: 0 packages remain.
[01:21:41] : [Step 8/13] LogCook: Display: CookWorker 0: 28033 packages remain.
[01:21:41] : [Step 8/13] LogCook: Display: CookWorker 1: 0 packages remain.
[01:21:41] : [Step 8/13] LogCook: Display: CookWorker 2: 0 packages remain.
[01:21:46] : [Step 8/13] LogCook: Display: Cooked packages 98833 Packages Remain 28033 Total 126866
[01:22:02] : [Step 8/13] LogCook: Display: Cooked packages 98833 Packages Remain 28033 Total 126866
[01:22:02] : [Step 8/13] LogCook: Display: [CookWorker 0]: Cooking HERE_IS_AN_AK_AUDIO_ASSET
[01:22:02] : [Step 8/13] LogAudioDerivedData: Display: [CookWorker 0]: Building compressed audio format BINKA hash BINKA_SCVER_5028_R4DV_false_SR0_48000.000000_SR1_32000.000000_SR2_24000.000000_SR3_12000.000000_SR4_8000.000000_QMOD_1.000000_CQLT
-1_ASTH_0.000000_INLC_false_LCK1_0.000000_CSZE_65536_LCF_false_ZCS_256_MCSO_0_END wave EmptyWave…
[01:22:02] : [Step 8/13] LogAudioFormatBink: Warning: [CookWorker 0]: Failed to encode bink audio: No sample data provided
[01:22:02] : [Step 8/13] LogAudioDerivedData: Warning: [CookWorker 0]: Cooking sound failed: SoundWave /Game/Demo/EmptyWave.EmptyWave
[01:22:02] : [Step 8/13] LogAudioDerivedData: Display: [CookWorker 0]: EmptyWave compressed to 0.00% (from 0KB to 0KB) with BINKA at Quality 80 with Quality Modifier 1.00
[01:22:02] : [Step 8/13] LogAudio: Display: [CookWorker 0]: Failed to retrieve compressed data for format BINKA and soundwave /Game/Demo/EmptyWave.EmptyWave.
[01:22:02] : [Step 8/13] LogAudio: Display: [CookWorker 0]: Failed to build BINKA derived data for /Game/Demo/EmptyWave.EmptyWave`

yeah,we are at 5.3. so we made a little changes

sounds good.