Evaluating the Virtual assets feature (Is it good for integrate/merge in source control? and some other questions)

We are evaluating the Virtual Asset feature to decide whether to use it in our production.

I set it up in a test branch and took a quick look at the source code/documents. I still have some questions:

  1. After we converted assets to Virtual Assets, is it possible for us to convert them back using UE provided feature(for eg., commandlets)? I took a look around the PackageTrailer class but I didn’t find any functions that we can use now.
  2. If we have multiple branches, is it recommended to convert them at a same time? Otherwise, will it lead to some issues in merging/integrating (between VA/non-VA branches)?
  3. Will using VA cause any performance regression? In PackageVirtualizationProcess.cpp, there is some mentioning of “future task/not optimized”. (Also, I got the impression that the payloads were fetched from P4 on demand, is it correct?)
  4. In Epic’s practice, is payload put in the same stream that the production assets are in? Or is every production branch has a corresponding VA folder(branch) outside itself?

Let me know if some of the answers are already somewhere in the document.

Thank you.

Shane

Edit:

Some updates/more information (still using the same question id):

  1. ​Is there something like “local virtual assets” which save all the payload to a local (under same p4 depot/branch) place? It sounds like converting back to a non-virtualasset setup but complicates the file system.
  2. Making an corresponding p4 folder/stream for every branch seems to complicate the branch merging/integrating process; while making the virtual asset stream/branch somewhere under the normal stream/branch seems to complicate the p4 mapping of the normal workspace. (we need to exclude the virtual assets folders from the p4 workspace mapping for most users.)

重现步骤

After we converted assets to Virtual Assets, is it possible for us to convert them back using UE provided feature

We are calling the process for this ‘rehydration’ although strictly speaking it should be named ‘hydration’ but the term has stuck. I will attach a pdf detailing the various options for this that are provided in the stock engine.

​Is there something like “local virtual assets” which save all the payload to a local (under same p4 depot/branch) place? It sounds like converting back to a non-virtualasset setup but complicates the file system.

We do have an ‘offline’ workflow for users that have unstable or poor connections so downloading payloads as needed is a pain. As far as I know, nobody internally has used this in production so I cannot vouch for it beyond my initial testing and due to the lack of interest we have not developed it further into a more usable system. Having said that I have linked the doc for it to a few licensees and have had no complaints so far so feel free to give it a try if you are interested.

If I am misunderstanding your question and you just want a way to save payloads locally then look into the file system based backend found at Engine\Source\Developer\Virtualization\Private\VirtualizationFileBackend.h which the offline mode makes use off.

If we have multiple branches, is it recommended to convert them at a same time?

This would depend on your internal branching/integration strategy so it’s hard to give all purpose advice on this but given that our binary assets tend not to be mergeable anyway I don’t think that virtualizing the data adds any additional complexity. Internally we have enabled virtualization where ever the majority of the content work is being done and let it flow to other branches from there.

If you have FeatureBranch->MainBranch->MilestoneBranch I’d expect the main conversion to be done in MainBranch. Worst case scenario any assets being submitted to FeatureBranch just won’t be virtualized until they are next edited and submitted on the MainBranch.

We also provide a commandlet (see Engine\Source\Editor\VirtualizationEditor\Private\VirtualizeProjectCommandlet.h) for virtualizing an entire project that can be used as part of an automated build process to virtualize assets that people miss, so could eventually clean up any non virtualized submits from other branches over time. Remember that when we virtualize a package we don’t actually change anything in the UObject’s themselves, we just remove the unstructured binary blob from the end of the file so it’s much less complex than re-saving a package file.

If you do want to consider any sort of automation for this then there is a UAT script that wraps the above commandlet found at Engine\Source\Programs\AutomationTool\Scripts\Virtualization.Automation.cs

Will using VA cause any performance regression?

Asset virtualization only affects editor data and will not affect runtime performance in anyway.

In the editor the performance may vary depending on your studios set up. Internally we use our shared DDC system as a cached backend which gives people at most of our sites fairly fast access to virtualized data if they need it, although it is still slower than reading from an SSD, but our perforce server is under quite a strain as you might imagine from a company our side so if the user does have to fallback to accessing data from there then the round trip times can easily be 500-1000ms at peak times.

However almost all of the access of virtualized data in the engine is done as an input to asset compilation which in turn produces more compact output data, think textures converting the raw pixel data found in the package file to DXT1/2/3 format. If your studio has a fairly hot DDC then most people will never actually try to access virtualized data in the first place but will get the post compiled data from the DDC. As an added bonus, the asset compilation system is quite well threaded so there should be very little access of virtualized data on the main thread so the user should generally not be interrupted by it unless they are editing a virtualized asset but even then they will access the data once and cache it to disk. We have thought about pre-caching systems for this but so far in practice the one off access has no proven worth spending any time on to improve.

One last anecdotal story, back when this system was first being developed I tried a full cook of one of our largest projects that I had cloned and 100% virtualized. I forced our system to only access the virtualized data from perforce and to recompile all assets that used virtualized data as an input. In the end the wall time for cooking this project was pretty much the same as cooking the non virtualized version of the project as all the stalls were hidden on background threads and there was enough other work in the meantime to render the waits irrelevant. Of course in the past couple of years there have been major improvements to our cooking system so we might see different results if we were to re-run that experiment today, but that was a test of the absolute worst case.

In Epic’s practice, is payload put in the same stream that the production assets are in? Or is every production branch has a corresponding VA folder(branch) outside itself?

Internally we store all of our virtualized payloads in their own depot, since users do not really need to know about them or sync them we have found it is the best way to keep them out of the way. Each project has it’s own root in this depot although this does mean that we do not get de-dupe savings between projects it does mean that we can more easily access the security permissions of the files if we ever needed to. Lastly all branches of each project use the same root

So our structure is something like:

//Payloads/Project1/…

//Payloads/Project2/…

//Payloads/Project3/…

Some licensees have required that they store the payloads within streamed depots either a separate one for payloads or within the same depot as their project which can make the set up harder to do. The following UDNs which should be visible to you contain discussion on that. If you cannot view the links let me know and I can copy/paste the relevant details to you here:

  • [Content removed]
  • [Content removed]
  • [Content removed]
1 Like

Attaching doc on rehydration

Attaching doc on offline mode

Thank you so much for the detailed answer! It really helped.

I have one more question for branch integrating (edge case). Sorry if it was already mentioned. I might have missed something.

Assuming we have a project1 and it has multiple branches:

//P4Depot/Project1/main/…

//P4Depot/Project1/dev1/…

//P4Depot/Project1/dev2/…

So we should have mulitple coressponding payload branches for each of the above branches:

//Payloads/Project1/main/…

//Payloads/Project1/dev1/…

//Payloads/Project1/dev2/…

Now assume we have an asset file A.uasset, and thus it will have a corresponding payload file:

//P4Depot/Project1/main/folder1/A.uasset#1

//P4Depot/Project1/dev1/folder1/A.uasset#1

//P4Depot/Project1/dev2/folder1/A.uasset#1

//Payloads/Project1/main/Human_unfriendly_hash/XXXX.payload

//Payloads/Project1/dev1/Human_unfriendly_hash/XXXX.payload

//Payloads/Project1/dev2/Human_unfriendly_hash/XXXX.payload

If the asset was modified in both the dev_branch1(delete) and dev_branch2(edit), when they are both merged back to the main branch, there should be a conflict(delete/edit). How could we resolve the corresponding payload’s conflict in this case? (The payload file name doesn’t likely provide enough information to find its original file.)

Shall we have some rules to prevent the conflict in the first place?

Currently we don’t delete any of the payload files that are submitted to revision control as even if the uasset file is deleted someone might sync back to an earlier revision where it is not deleted and need access to that data. If the user edits the uasset in such a way that the virtualized data actually changes (such as importing a new texture) then the payload will be given a new hash and a new file will be submitted to revision control when the uasset is next submitted.

There has been limited interest in how to prune upayload files that are no longer needed, either because .uasset files have been obliterated in perforce or have a limited revision history and start to purge themselves but so far the question as only ever been asked during the evaluation stages and I don’t know of any licensee that actually tried to implement a pruning system, at least if they did then they did not report back to us about it.

My suggestion so far has been that you would need to maintain some sort of database of uasset files with an entry for each revision detailing which virtualized payloads they reference and you would need to do this for every active branch in your depot that you are interested in. Given the fairly static format of the package trailer (see code documentation in Engine\Source\Runtime\CoreUObject\Public\UObject\PackageTrailer.h) and that the structure could be read backwards you could probably make a quick and easy python script to p4 print the end of a uasset and parse the info you need. You’d then only need to update the data base for new revisions of each file or remove the entries when revisions or files are obliterated.

Using this data base you could generate a list of used payloads vs unuased payloads found in your depot and know which are safe to obliterate although I’d suggest maybe backing them up elsewhere for a small period as insurance.

Thank you for the detailed explanation! :smiley:

Recently, our project also evaluate the value and risk using the VA feature.

If enable the VA, a normal asset submit by editor become virtual asset.

Others sync this file, modify and submit directly by p4, in this situation, the asset in depot will become non-virrtualize normal asset or an incompleted, erroneous asset?

If the asset is corrupt? How to fixed this error quickly.

The official document also mention that, if the DDC performance is poor, faile to get the payload, maybe crash when using editor, does it easy to located and fixed the error.

One the others hand, using the VA feature, does it increase the time to submit the asset?

Because it need to separate the asset into two parts.

If this process need more time? If possiable this auto processing by CI/CD build machine.

Hey, sorry this took a while to get a reply as I’ve been OOO.

If enable the VA, a normal asset submit by editor become virtual asset.

Correct.

Others sync this file, modify and submit directly by p4, in this situation, the asset in depot will become non-virrtualize normal asset or an incompleted, erroneous asset?

If the modification doesn’t change the virtualized data in any way, such as changing the brightness property on a texture then the file will remain virtualized when they submit via perforce. The user will have downloaded the virtualized data when making the modification so that the editor could display the result but that data is stored in their local DDC, not the original file.

If the modification changes the virtualized data, such as re-importing the texture or modifying it’s original pixel data then that modified data will be stored in the package file when the user saves their changes. If they were then to submit via perforce directly you would end up with a full sized file in the depot.

The official document also mention that, if the DDC performance is poor, faile to get the payload, maybe crash when using editor, does it easy to located and fixed the error.

By default if a virtualized payload request fails as the system either cannot find the virtualized payload or cannot connect to the backend (DDC/Perforce etc) then we will display an error dialog informing the user of the problem and which will allow them to retry the operation or quit the editor.

Originally we tried to avoid this and allow the editor to continue but a lot of the older engine code was not really written with handling disk read errors in mind and we found that allowing the editor to continue at this point often lead to data corruption problems. For example if heightmap data cannot be retrieved it would be treated as empty, then saving that package would effectively wipe out the heightmap data entirely.

Ideally we’d be able to harden all of the code paths that deal with data access to avoid this and/or just prevent the saving of any package we detect had a data access error, but there is a lot of code and no good 100% fool proof way to make sure that all data access failures can be attributed to the correct package file and that is before we consider licensee code bases.

Since complete infrastructure outages are relatively rare, this was a safer and easier option.

Note that poor DDC performance (as you mention it) would not cause these errors it would just mean that the user would wait longer for the data. However the vast majority of our virtualized data access is done asynchronously and doesn’t stun the editor.

The error dialog does look a little out of place as it is a system dialog not slate, since we we might need to call it from background threads which slate does not support. Marshalling the request to the GameThread can cause thread locks in some scenarios so we’ve avoided that.

You can add a string to ini:Engine:[Core.VirtualizationModule]:PullErrorAdditionalMsg=”” to add additional info and instructions to your users in this error dialog, if you have an internal FAQ help page for example.

If you really want you can try ini:Engine:[Core.VirtualizationModule]:UseLegacyErrorHandling=true which will attempt to continue after a failure but I wouldn’t recommend it due to the aforementioned corruption risks.

One the others hand, using the VA feature, does it increase the time to submit the asset?

The initial submit might take slightly longer as there will be more perforce commands. In theory subsequent submits of the asset might be faster as the file will be smaller depending if you are using delta transfers or not.

If possiable this auto processing by CI/CD build machine.

You can set ini:Engine:[Core.VirtualizationModule]:EnablePayloadVirtualization=false to disable virtualizing packages by default then run a job on a build machine to virtualize if you prefer.

There is both a UAT command VirtualizeProject and a commandlet -run=“VirtualizationEditor.VirtualizeProject” that already handle searching projects for non virtualized content and virtualizing it.

You may also want to check the code documentation in Engine\Source\Developer\Virtualization\Private\VirtualizationManager.h which contains further descriptions of the ini values I have been mentioning.

Thanks for your reply, So the conclusion is that.

Virtual asset in depot sync and modify, submitting directly by p4 is unsafed.

Others one sync and modify it, this process does not change the virtual data, can’t not submit directy by p4, because the original file does not change.

Only change the virtual asset(e.g. re-importing), the origianl file will become completed data, submiition in p4 directly is safed.

If the conclusion is right, For the sake of asset security and valid, we need to intercept direct p4 submissions.

In addition, I’m interested in the application of virtual assets under multi-branch coordinated development. But the official document does not detail info about it.

From your early reply, does it recommand only creating the new virtual asset in main stream, and merge these asset to other feature stream? For reducing the binary asset conflict?

On the otherhand, just one seperate depot and stream(like //LyraVA/main) storing the VA payload for all stream in project(like //Lyra/main, //Lyra/dev,…). Becuase this payload does not need to reslove or merge.