Pain of Unreal Engine: binary assets

There are serveral problems with assets in Unreal Engine:

  • Binary format;
  • There are user-specific data inside asset files;
  • There are generated non-source data inside asset files;

These problems are compounded each other, but they can be solved separately.
And now in more detail for each item.

Binary format
Assets files are binary and because of this work with them only by means of the editor.
Just binary format makes it impossible to fully use version control:

  • You can not see what has changed in the file;
  • You can not merge the files;
  • You can not ā€œcherry-pickā€ fix from ā€œmasterā€ to ā€œstableā€ branch;
  • Not efficient storage in the version control system;
  • You can not use hooks (for example, you can’t make hook for checking: asset of imported model and fbx must be commited in one commit).

Just because of the impossibility of viewing live data from a file, it tends to accumulate debris: often a situation arises when a file is changed, but the changes in the editor view shows nothing.

It should be noted that the argument ā€œuse a locksā€ does not work for the following reasons:

  • I do not know at the beginning of what files need to be changed for the task;
  • Lock does not solve the problem of managing branches: the only way to merge changes from branch to branch - to make them manually again.

There are user-specific data inside asset files
Part of the data, which stored in the assets is user-specific, for example:

  • The scale of the graph display blueprints;
  • Breakpoints.

IMHO, these data should be somewhere in the personal user space. I do not want to modify ā– ā– ā– ā– ā–  by adding/removing breakpoints, but saving breakpoints on editor restart is useful.

There are generated non-source data inside asset files
Asset store information that is an artifact of the assembly, for example:

  • Blueprints compiled bytecode for Kismet-VM;
  • Lightmap precomputed data;
  • Assets preview image;

IMHO, this data should not be stored on the disk at all (for cheap to compute data, for example: bytecode for Kismet-VM), or to storage in the cache (for expencive to compute data, for example: Lightmap).

Because of this, even on a small project we get the ever-growing volumes of data and incomprehensible release management.

Hi , thanks for posting this.

We’ve been considering text based asset files for some time but we haven’t really had time to get to it.

We do already store some of the expensive to compute data in a separate cache, it’s called Derived Data Cache. I wouldn’t drop the seemingly ā€˜cheap’ data though. It may seem cheap in small projects but it can quickly grow to become painful. I guess the alternative would be to store small data as byte values in text assets or move them to DDC as well.

Makes sense, but…

Question: when cooking (esp. for mobile), all that user data is stripped away, right?
If yes, why does it matter if UE stores it or not?
If no, it’s a major flaw and potential security risk; I trust it’s stripped though.

Text data would be 10x the size, with anything but toy-sized project it’d be problematic.
For diffs to be usable, the file structure needs to be deterministic. I’m not sure how much data shuffles around in UE.
Anyhow, a utility that converts binary<->text may be useful to inspect and merge changes.

Hi ,

I realize that this problem can not be solved quickly, but I want to pay attention to it. This is really important.
I’m aware of DDC, but a lot of data (VM byte code, preview images, etc) lie just inside the asset files. These are not the source data and they should be separated from the assets files. Unreadable format assets list allows you to hide the problem.

Hi ,

I’m talking about the source data for the editor. Cooking for these different requirements and binary format is preferred.
In this case, I am sure that the size of the text representation is comparable to or even less than the size of the binary data: the format of a redundant, but will be less redundant data.
For example, we have an asset with the game level, which occupies about 500 mb. More than half of this file takes precalculated lighting, which is good to be stored separately.

Another problem is that because of this mix of data, we can not precalculate the lights (which is really long) without stopping work on the level.

One other point that we have discussed a lot here is that just because an asset is represented as text does not necessarily make it nice to diff/merge. Something like a Blueprint graph is very hard to read as text, and even harder to merge without breaking it. That is why we have been working hard on a visual diff/merge tool for Blueprints.

Complex graph change is really difficult for diff/merge as text.

But in the case of binary assets list, we lose much simpler user scenarios:

  • the merging of the independent graphs (eg different functions);
  • the merging of changed default value.
    Visual tool eliminates the automatic merge in cases where it would be possible.

In addition, the binary format hiding garbage data serialization.

I very often have empty visual diff in modified data. At the same time these changes affect the behavior of the editor.

Good points here.

The reason of binary format for assets is only for fast loading stuff in editor. When engine cook everything, probable do more striping and optimizations. Personally, I don’t care about the speed of loading assets in editor. And let’s say editor will use only text assets (like XML), the speed difference will not be so noticeable. I mean, I prefer any text format over binary anything, from lots of reasons.

By example, lets consider textures, can be done in 2 ways:

  • full xml file
  • original texture + xml descriptor for editor and settings and link to original texture

I vote for this feature too, to have all assets in a text format instead binary.

I think the choice of syntax for the text format, is a topic for another discussion.
But when I solved the problem of asset storage on another project, I got the following comparison:

  • YAML (block format) - Concise, easy to merge. Supports user types a lot more comfortable than the XML namespace.
  • JSON - Simple and familiar syntax, but are lacking in type declarations within the format.
  • XML - Too excessive. Very difficult correct software processing.
1 Like

I did not said XML is best. I said ANY text format is better that binary for editor.

I am saddened. Still broken downloads blueprints and there is no hope for a solution:

[=;267851]
I am saddened. Still broken downloads blueprints and there is no hope for a solution:

Hi ,

In both answerhub posts it was determined that the cyclic dependency issues you were seeing have been fixed and should be ready in 4.8. Was there more information you wanted to pass on in regards to this?

I missed the last comment. It was intended for another post.

However, it should be noted that minification projects for these problems was very difficult because of the assets binary. I’ve spent on this task for about five days. If I could see what was really written to the file, it would be much easier.

You can dump more information about a package with the PkgInfo commandlet, and you can diff two assets with the DiffPackages commandlet, both of which produce a textual representation of the asset (when the editor is connected to source control and you are not debugging something that messes up on load like the circular dependency issue, you can do diffing directly from the content browser context menu).

Cheers,
Noland

Hi!
I totally support the ideas here, by following order of priority on my mind:

  • uasset files should certainly never get ā€˜polluted’ by precompiled data, nor user preferences: only sources data.
  • uasset files should then be serialized (at least as an option) in YAML powerful yet efficient text format

Hi ,

Can you give more information, about how to diff assets with DiffPackages commandlet ? I tried googling it, and didn’t find anything useful.
Thanks

[= Noland;269208]
You can dump more information about a package with the PkgInfo commandlet, and you can diff two assets with the DiffPackages commandlet, both of which produce a textual representation of the asset (when the editor is connected to source control and you are not debugging something that messes up on load like the circular dependency issue, you can do diffing directly from the content browser context menu).

Cheers,
Noland
[/]

Bump!

I belive this is very important and can solve a lot of issues, especially fixing weird BP bugs. I think that sometimes there is something bad happening under the hood and blueprint can contain old garbage data. If we will be able to see text form of assets even in hardly readable form, fixing corrupted assets would be much easier.

Yes, binary assets are IHMO generally one of the biggest problems of UE4, especially for level maps and blueprint classes.

Example of problem ā€œtwo similar assets have different behaviourā€: https://github.com//UE4-broken-on-cook

Also blueprint merge and diff tools are broken by design: for the difference, they should be able to read and parse the asset data.

Yeah I like to see in Unreal Engine 4 the assets like Materials, Blueprints and maps using text based formats way better to check for bugs, changes and to edit out of the editor.

Bump.

Are text based assets even on the roadmap? If not, they really really really must be. We’re just prototyping with UE4 and most things seem great but binary assets are a major PITA that make our daily work extremely difficult and error prone, no-one has a f*****g clue what’s going on in the assets. Having to rely on an external tool to diff is NOT a solution, it’s a crappy bandaid. JSON or YAML or something human readable and understandable is absolutely needed.