How best to run a CI job to virtualize assets post-submit

We would like to run a CI job that will come along and find any assets not virtualized that should be so and submit them. I’m looking at the arguments for the UnrealVirtualizationTool and realizing that I’m not entirely sure how to run it in a way that it will effectively run on the entire project but honor the VirtualizationFilterSettings. We have the following settings in our DefaultEngine.ini (only including relevant settings to the question here, but let me know if you need to know more):

[Core.VirtualizationModule]
FilterMode=OptIn
 
[/Script/Virtualization.VirtualizationFilterSettings]
+IncludePackagePaths="/OurProjectName/Some/Package/Path/"
+IncludePackagePaths="/OurProjectName/Some/Other/Package/Path/"

If I checkout a bunch of assets and run “UnrealVirtualizationTool.exe -Mode=Virtualize -Changelist=12345” it seems to work correctly, where it will only virtualize what’s specified, but is there a way to tell it to attempt to virtualize the entire project (while only actually virtualizing files that are in the included filter paths above and need to be virtualized)?

The goal is to run something like “UnrealVirtualizationTool.exe -Mode=Virtualize” (without a specific Changelist or Path) and end up virtualizing any files in the project that both are in the filtered path and need to be virtualized. I know that’s not valid syntax, but is there a valid syntax that can achieve this? And more importantly, is this something you suggest doing or are we heading down a path we should avoid?

As a follow up question, is it important to have this CI job do a cook of the assets before submitting them to populate our DDC first? We have other CI jobs that cook and will populate the DDC backends, but I worry about the timing window between when a non-virtualized asset is submitted, when this virtualization job runs (which would be fairly quick if not cooking), and when the assets end up being cooked. If someone syncs in the middle (after virtualization but before the slower full cook CI job runs) they risk ending up in a situation where they then have to inflate the files when loading them in the editor, correct?

Thanks for any advice you can offer on this.

Edit: Providing some additional information since I kept experimenting with this. It seems that the -PackageDir arg can be used for this, though it takes an actual file path as opposed to a package mount path (i.e. “d:\projectworkspace\projectname\Content\assetpath” as oppposed to “/ProjectName/AssetPath/”).

I think I can work with this, but I’ve encountered another interesting issue. It seems like it will only try to resolve the p4 client spec correctly if “-Changelist” is provided (which expects a changelist of files to process, not an empty changelist). If I just use “UnrealVirtualizationTool.exe -Mode=Virtualize -Checkout -PackageDir=\path\to\packages” then I’ll get Source Control errors about an ambiguous client. This makes sense as I have multiple client specs that it could match, but I do have a valid .p4config file that shows correctly in “p4 set” so I’m not sure why it won’t pick up on that. I tried passing -P4Client arg to it, but it doesn’t seem to respect that arg.

but honor the VirtualizationFilterSettings

UnrealVirtualizationTool (UVT) should respect the VirtualizationFilterSettings for your project. To make this work the tool runs in multiple passes, the first pass finds the files you request and sorts them by project, then we launch a new instance of the tool per project (in serial) with the .uproject file as the first parameter of the commandline so that the child process loads the full config file tree for that project. Technically you can skip the child process being invoked if you are sure that the files are all going to be for the same project by passing in the project file yourself but personally I don’t usually bother.

but is there a way to tell it to attempt to virtualize the entire project (while only actually virtualizing files that are in the included filter paths above and need to be virtualized)?

As you noted in your update the UVT can do this. In UE5 we store the payloads that are able to be virtualized outside of the packagefile format in a trailer that is appending to the end of the .umap/.uasset file. This means that we don’t have to load the UPackage or UObjects to work out where the data is etc we can just parse the trailers header which is relatively fast, see Engine\Source\Runtime\CoreUObject\Public\UObject\PackageTrailer.h for the details on this.

We do also provide a commandlet found at Engine\Source\Editor\VirtualizationEditor\Private\VirtualizeProjectCommandlet.h which uses the asset registry to determine which package files are in the project and which aren’t. There is also UAT command Engine\Source\Programs\AutomationTool\Scripts\Virtualization.Automation.cs which builds the editor and calls this commandlet.

At the lower level both UVT and the editor commandlet run the exact same code when actually virtualizing. UVT was originally intended for quick workflows, fixing up issues etc as well as being invoked by our SubmitTool and/or p4v menu extensions and the commandlet/uat script was intended for CIS/automation, but they both should do what you want.

It seems that the -PackageDir arg can be used for this, though it takes an actual file path as opposed to a package mount path

At this point in the tool it does not know about the project’s mount points as it hasn’t loaded the configuration for a project, so it needs to work on absolute file paths instead. I have been meaning to do a pass on UVT’s help documentation and I will try to make it clearer.

Unfortunately fully mounting all the mount points tends to be quite slow and by that point you might as well have just run the commandlet instead.

but I do have a valid .p4config file that shows correctly in “p4 set”

When I started VA I reused the source control module in the UE editor as I hoped at the time it meant that we could get the system working easily for all of the source control types we support. Unfortunately I did not realize how closely tied to the editor and the content browser that module tended to be and it required a lot of custom functionality to be added to the perforce implementation that meant it doesn’t work with the other source control types.

The UE source control module has a quirk where the p4config file will be read from the root of the exe, not the project working directory or the content working directory which can cause weird results, especially if the project is not stored in the same directory structure as the engine. I have brought up the possibility of changing this but there is concern that it might break existing workflows but my first suspicion is that this is the same problem.

You could confirm this by running system internals Process Monitor and see where UVT is trying to access the p4config file on your system.

And more importantly, is this something you suggest doing or are we heading down a path we should avoid?

Unfortunately we do not have an automated job like this ourselves due to the number of branches that we generally have in active development and generally rely on packages being virtualized as users submit them. We tend to do a yearly clean up instead. Having said that, I would prefer that we were running something more frequently and I know that some other licensees are doing this. I certainly don’t think that it’s a bad idea if it doesn’t cause disruption to your work flows.

As a follow up question, is it important to have this CI job do a cook of the assets before submitting them to populate our DDC first?

Virtualizing a package file should have no affect on it’s cooked outputs. As I noted earlier, virtualizing a package does not actually change the package format itself. So if a fully unvirtualized texture has been cooked and the cooked output is in your shared DDC then that same cooked output will be valid for the texture post virtualization.

I tried passing -P4Client arg to it, but it doesn’t seem to respect that arg.

The p4 args on the cmdline should be overriding everything else so I have no idea why that is not working for you. I can try to have a quick look tomorrow to see if I can reproduce the problem but if you could provide the exact error message (feel free to scrub the p4 client name and such from the message then that would be helpful, as well as any other p4 related snippets you might see in the UVT log that you think could be related.

I tried passing -P4Client arg to it, but it doesn’t seem to respect that arg.

The check out of the files is done by the child process but I am not forwarding the p4 cmdline values so the child process doesn’t use them.

We don’t use the -checkout flag internally in production and after poking at this I think there are a few scenarios I need to test against and maybe clean up in addition to the issue that you reported, so I will try to put some time aside in the next week or so to tackle it.

As a work around for now, assuming you are only running UVT on a single project try something like the following:

-Mode=virtualize <path to projects .uproject> -PackageDir="<path>" -P4Client="<client spec name>" -checkoutwhich should skip the creation of the child project as the tool will already have the uproject.

Thanks for the reply and follow up. I was on PTO last week and earlier this week, so I’m just having a chance to review this. That all makes sense, especially the part about not passing the command to the child processes. I hadn’t gotten as far as debugging the spawned processes before my PTO, but I did see the args coming into the parent process. I’ll try to make some time later this week to try this out, and if I run into issues I’ll provide some more logs.

With that said, What you mentioned about the use case for the virtualization tool also makes sense. I didn’t realize the commandlet path to invoke the virtualization, which very well might be a better option for us. I’ll try that as well.