Evaluating the Virtual assets feature (Is it good for integrate/merge in source control? and some other questions)

We are evaluating the Virtual Asset feature to decide whether to use it in our production.

I set it up in a test branch and took a quick look at the source code/documents. I still have some questions:

  1. After we converted assets to Virtual Assets, is it possible for us to convert them back using UE provided feature(for eg., commandlets)? I took a look around the PackageTrailer class but I didn’t find any functions that we can use now.
  2. If we have multiple branches, is it recommended to convert them at a same time? Otherwise, will it lead to some issues in merging/integrating (between VA/non-VA branches)?
  3. Will using VA cause any performance regression? In PackageVirtualizationProcess.cpp, there is some mentioning of “future task/not optimized”. (Also, I got the impression that the payloads were fetched from P4 on demand, is it correct?)
  4. In Epic’s practice, is payload put in the same stream that the production assets are in? Or is every production branch has a corresponding VA folder(branch) outside itself?

Let me know if some of the answers are already somewhere in the document.

Thank you.

Shane

Edit:

Some updates/more information (still using the same question id):

  1. ​Is there something like “local virtual assets” which save all the payload to a local (under same p4 depot/branch) place? It sounds like converting back to a non-virtualasset setup but complicates the file system.
  2. Making an corresponding p4 folder/stream for every branch seems to complicate the branch merging/integrating process; while making the virtual asset stream/branch somewhere under the normal stream/branch seems to complicate the p4 mapping of the normal workspace. (we need to exclude the virtual assets folders from the p4 workspace mapping for most users.)

重现步骤

After we converted assets to Virtual Assets, is it possible for us to convert them back using UE provided feature

We are calling the process for this ‘rehydration’ although strictly speaking it should be named ‘hydration’ but the term has stuck. I will attach a pdf detailing the various options for this that are provided in the stock engine.

​Is there something like “local virtual assets” which save all the payload to a local (under same p4 depot/branch) place? It sounds like converting back to a non-virtualasset setup but complicates the file system.

We do have an ‘offline’ workflow for users that have unstable or poor connections so downloading payloads as needed is a pain. As far as I know, nobody internally has used this in production so I cannot vouch for it beyond my initial testing and due to the lack of interest we have not developed it further into a more usable system. Having said that I have linked the doc for it to a few licensees and have had no complaints so far so feel free to give it a try if you are interested.

If I am misunderstanding your question and you just want a way to save payloads locally then look into the file system based backend found at Engine\Source\Developer\Virtualization\Private\VirtualizationFileBackend.h which the offline mode makes use off.

If we have multiple branches, is it recommended to convert them at a same time?

This would depend on your internal branching/integration strategy so it’s hard to give all purpose advice on this but given that our binary assets tend not to be mergeable anyway I don’t think that virtualizing the data adds any additional complexity. Internally we have enabled virtualization where ever the majority of the content work is being done and let it flow to other branches from there.

If you have FeatureBranch->MainBranch->MilestoneBranch I’d expect the main conversion to be done in MainBranch. Worst case scenario any assets being submitted to FeatureBranch just won’t be virtualized until they are next edited and submitted on the MainBranch.

We also provide a commandlet (see Engine\Source\Editor\VirtualizationEditor\Private\VirtualizeProjectCommandlet.h) for virtualizing an entire project that can be used as part of an automated build process to virtualize assets that people miss, so could eventually clean up any non virtualized submits from other branches over time. Remember that when we virtualize a package we don’t actually change anything in the UObject’s themselves, we just remove the unstructured binary blob from the end of the file so it’s much less complex than re-saving a package file.

If you do want to consider any sort of automation for this then there is a UAT script that wraps the above commandlet found at Engine\Source\Programs\AutomationTool\Scripts\Virtualization.Automation.cs

Will using VA cause any performance regression?

Asset virtualization only affects editor data and will not affect runtime performance in anyway.

In the editor the performance may vary depending on your studios set up. Internally we use our shared DDC system as a cached backend which gives people at most of our sites fairly fast access to virtualized data if they need it, although it is still slower than reading from an SSD, but our perforce server is under quite a strain as you might imagine from a company our side so if the user does have to fallback to accessing data from there then the round trip times can easily be 500-1000ms at peak times.

However almost all of the access of virtualized data in the engine is done as an input to asset compilation which in turn produces more compact output data, think textures converting the raw pixel data found in the package file to DXT1/2/3 format. If your studio has a fairly hot DDC then most people will never actually try to access virtualized data in the first place but will get the post compiled data from the DDC. As an added bonus, the asset compilation system is quite well threaded so there should be very little access of virtualized data on the main thread so the user should generally not be interrupted by it unless they are editing a virtualized asset but even then they will access the data once and cache it to disk. We have thought about pre-caching systems for this but so far in practice the one off access has no proven worth spending any time on to improve.

One last anecdotal story, back when this system was first being developed I tried a full cook of one of our largest projects that I had cloned and 100% virtualized. I forced our system to only access the virtualized data from perforce and to recompile all assets that used virtualized data as an input. In the end the wall time for cooking this project was pretty much the same as cooking the non virtualized version of the project as all the stalls were hidden on background threads and there was enough other work in the meantime to render the waits irrelevant. Of course in the past couple of years there have been major improvements to our cooking system so we might see different results if we were to re-run that experiment today, but that was a test of the absolute worst case.

In Epic’s practice, is payload put in the same stream that the production assets are in? Or is every production branch has a corresponding VA folder(branch) outside itself?

Internally we store all of our virtualized payloads in their own depot, since users do not really need to know about them or sync them we have found it is the best way to keep them out of the way. Each project has it’s own root in this depot although this does mean that we do not get de-dupe savings between projects it does mean that we can more easily access the security permissions of the files if we ever needed to. Lastly all branches of each project use the same root

So our structure is something like:

//Payloads/Project1/…

//Payloads/Project2/…

//Payloads/Project3/…

Some licensees have required that they store the payloads within streamed depots either a separate one for payloads or within the same depot as their project which can make the set up harder to do. The following UDNs which should be visible to you contain discussion on that. If you cannot view the links let me know and I can copy/paste the relevant details to you here:

  • [Content removed]
  • [Content removed]
  • [Content removed]

Attaching doc on rehydration

Attaching doc on offline mode

Thank you so much for the detailed answer! It really helped.

I have one more question for branch integrating (edge case). Sorry if it was already mentioned. I might have missed something.

Assuming we have a project1 and it has multiple branches:

//P4Depot/Project1/main/…

//P4Depot/Project1/dev1/…

//P4Depot/Project1/dev2/…

So we should have mulitple coressponding payload branches for each of the above branches:

//Payloads/Project1/main/…

//Payloads/Project1/dev1/…

//Payloads/Project1/dev2/…

Now assume we have an asset file A.uasset, and thus it will have a corresponding payload file:

//P4Depot/Project1/main/folder1/A.uasset#1

//P4Depot/Project1/dev1/folder1/A.uasset#1

//P4Depot/Project1/dev2/folder1/A.uasset#1

//Payloads/Project1/main/Human_unfriendly_hash/XXXX.payload

//Payloads/Project1/dev1/Human_unfriendly_hash/XXXX.payload

//Payloads/Project1/dev2/Human_unfriendly_hash/XXXX.payload

If the asset was modified in both the dev_branch1(delete) and dev_branch2(edit), when they are both merged back to the main branch, there should be a conflict(delete/edit). How could we resolve the corresponding payload’s conflict in this case? (The payload file name doesn’t likely provide enough information to find its original file.)

Shall we have some rules to prevent the conflict in the first place?

Currently we don’t delete any of the payload files that are submitted to revision control as even if the uasset file is deleted someone might sync back to an earlier revision where it is not deleted and need access to that data. If the user edits the uasset in such a way that the virtualized data actually changes (such as importing a new texture) then the payload will be given a new hash and a new file will be submitted to revision control when the uasset is next submitted.

There has been limited interest in how to prune upayload files that are no longer needed, either because .uasset files have been obliterated in perforce or have a limited revision history and start to purge themselves but so far the question as only ever been asked during the evaluation stages and I don’t know of any licensee that actually tried to implement a pruning system, at least if they did then they did not report back to us about it.

My suggestion so far has been that you would need to maintain some sort of database of uasset files with an entry for each revision detailing which virtualized payloads they reference and you would need to do this for every active branch in your depot that you are interested in. Given the fairly static format of the package trailer (see code documentation in Engine\Source\Runtime\CoreUObject\Public\UObject\PackageTrailer.h) and that the structure could be read backwards you could probably make a quick and easy python script to p4 print the end of a uasset and parse the info you need. You’d then only need to update the data base for new revisions of each file or remove the entries when revisions or files are obliterated.

Using this data base you could generate a list of used payloads vs unuased payloads found in your depot and know which are safe to obliterate although I’d suggest maybe backing them up elsewhere for a small period as insurance.

Thank you for the detailed explanation! :smiley: