Packaged build is corrupted by unknown reason

Hi,

we’re currently facing an issue with our packaged build which we have trouble pinpointing the root cause.

So, first some context: we’re building projects with around 225GB of (uncompressed) content. This is done with an automated pipeline which runs the build process on a suitable instance in the cloud. We’re using a shared DDC to decrease cook time, incremental cook is disabled. Almost all of the content is packaged into a single ucas/utoc file. Repeated builds also failed in the same way.

We first noticed the issue when the packaged applications failed to execute tests. It crashes with the following error (a log is attached, -LogCmds=“LogPackageName Verbose” isn’t set there but didn’t log anything in another run):

Error: Assertion failed: GDefaultMaterials[Domain] != nullptr [File:D:\build\++UE5\Sync\Engine\Source\Runtime\Engine\Private\Materials\Material.cpp] [Line: 603] 
Error: Cannot load default material 'engine-ini:/Script/Engine.Engine.DefaultMaterialName' [Domain=MD_Surface] from path '/Engine/EngineMaterials/WorldGridMaterial.WorldGridMaterial'

After some analysis we found that the packaged .ucas file seems to be corrupted. We verified using UnrealPak.exe -extract and saw a huge amount of errors that assets are corrupted (OutputExtract.zip attached):

<...>
LogCompression: Error: FCompression::UncompressMemory - Failed to uncompress memory (24933/65536) from address 00000537C54219A0 using format Oodle, this may indicate the asset is corrupt!
OodleDataCompression: Error: OodleDecode: compressed buffer starts with zero byte; invalid or corrupt compressed stream.
LogCompression: Error: FCompression::UncompressMemory - Failed to uncompress memory (38665/65536) from address 00000537C5427B10 using format Oodle, this may indicate the asset is corrupt!
OodleDataCompression: Error: OodleDecode: compressed buffer starts with zero byte; invalid or corrupt compressed stream.
LogCompression: Error: FCompression::UncompressMemory - Failed to uncompress memory (4538/19396) from address 00000537C5431220 using format Oodle, this may indicate the asset is corrupt!
LogIoStore: Error: Failed reading chunk for file "../../../AVE_Application/Content/<path to asset>/<asset name>.ubulk" (Failed uncompressing chunk (Read Error)).
<...>

The files extract to disk despite the errors but I wasn’t able to verify if the game crashes with the extracted assets as well.

The log of cooking and packaging showed no error indicating something went wrong (OutputCookPackage.txt attached).

What tools or strategies can we use to pinpoint the issue? We’re feeling like we’re in a dead end right now in our analysis.

Cheers,

Simon

[Attachment Removed]

Hi!

I’ve reached out to the IOStore devs to check what additional options for verification we have.

Before we dive too deep, did you already check if a build without IOStore (either pak files, or loose files) works as expected?

There’s a slim chance that the problem could be coming from upstream, like a corrupted DDC. Usually that shouldn’t affect the compression, though.

One additional check that might narrow this down is to enable signature verification for the pak/IOStore containers under Project > Encryption > Signing. That would make sure that the build verifies that the files did not get modified after being created. This will likely not show any errors, if the issue is indeed during creation of the IOStore containers, but it’s a good sanity check.

I’ll let you know once I have some advice from the team.

Best,

Sebastian

[Attachment Removed]

Hi Sebastian,

thanks for the quick answer.

We disabled the use of the DDC and repeated the build successfully. Therefore the corruption must stem from some contents there.

So the content of the question changes slightly:

What can we do on our pipeline to prevent that? Are there ways to automatically verify DDC content?

I guess it’s quite hard, if not impossible, to find out what exactly got corrupted and why.

Cheers,

Simon

[Attachment Removed]

> What can we do on our pipeline to prevent that? Are there ways to automatically verify DDC content?

I guess it’s quite hard, if not impossible, to find out what exactly got corrupted and why.

The corruptions we see on our side are usually rare and intermittent. Usually, they are solved by deleting the affected cache entries and often the root cause is not easy to identify.

Sometimes issues introduced by code changes can lead to corrupted DDC entries and even if these are subsequently fixed, the cached entries remain.

You’re right that it’s usually very hard to track these down, so for now you can try identifying the affected entries and deleting them.

There is several command line parameters that can be used to verify the DDC contents:

The -DDC-Verify command line parameter enables a verification mode where the cached data is regenerated and then compared with what is already in the cache, with any differences saved for comparison and written to the log. You can limit the scope of verification to specific cache buckets, like **-DDC-Verify=Texture[Content removed] which can be useful if you want to reproduce a failed verification in another run.

You can also use the DerivedDataCacheCommandlet with the FILL parameter to load/save all your assets instead of using the editor or a cook run.

A run with a 100% verify rate, while using your usual DDC should find you any differences between cached and generated entries. It will take longer since it is essentially a cold run and you should find any discovered issues in the log for you to investigate or for deleting the affected items if you don’t want to clear all of your DDC.

Kind Regards,

Sebastian

[Attachment Removed]

Hi Sebastian,

thanks a lot for the insights. It’s really helpful and should make the next DDC corruption less painful.

Cheers,

Simon

[Attachment Removed]

Hi Sebastian,

I’ve got an update. We’re encountering the issue again. The first build after deleting the entire DDC works fine, but subsequent build fail with the same error mentioned above.

We verified the DDC twice, but the log didn’t show any obvious problems, certainly no corrupted buckets. See attached logs.

Currently we’re investigating if there are any references to the WorldGrid Material introduced recently, or if there are staging issues in our pipeline.

If you’ve go more tips for us how to effectively chase down the issue, it’s highly appreciated.

Cheers,

Simon

[Attachment Removed]

Missed an attachment…

[Attachment Removed]

Hi all,

the issue was caused by a corrupted Perforce workspace on the build machine. We don’t know the cause of the corruption, setting the stream up cleanly resolved the issue.

The ticket can be closed.

Cheers,

Simon

[Attachment Removed]

Noted, thank you!

[Attachment Removed]