Request for the data validation framework to support batch validating assets with the same validator

An issue with the current framework for EditorDataValidators is that they can only validate 1 asset at a time, but some validation can be significantly more performant to run all the entire set of assets in a single pass. For example, we have validation to compile Blueprints and check for compilation errors. Most of the BP compilation time is spent reinstancing the BP and running GC, so it is significantly faster to request that all BPs be compiled back to back because that will delay the reinstancing until after they have all been compiled. We currently have several thousand BPs, so validation of all of them was taking 8-9 minutes due to the reinstancing. Batching the compile into a single pass reduces this to less than 30 seconds.

What I’ve prototyped locally is to have our compilation validator just collect the BPs to be validated in its validation function, and then exposed a new overridable function call in UEditorValidatorSubsystem::ValidateAssetsInternal() after the main validation loop so that our subsystem subclass can receive the FValidateAssetsResults into which the validation errors should be logged. That mostly works, except that all of the per-asset result handling was already done in the loop. Some of that could be manually replicated, but the per-asset pass/fail handling becomes a lot more complicated if the main validation of an asset passes but the “batch validation” fails it. Handling that correctly will require refactoring the loop quite significantly. It would be nice to have official support for this ability to run specialized data validation on all assets to be validated.

I’m happy to share any further information about this request, and to talk through possible short term options for implementing this on our end with minimal changes to EditorValidatorSubystem.

-Peter

Hi Peter,

While I think there would be value in this, I can’t promise we are going to look at that request. I will still note it on our side.

I think the minimal engine change to support that type of batching operation would be to add a broadcast before the loop asset where the asset are about to be loaded and then validated. Your validator could hook it self and determine if it need to load some of the asset upfront and then batch compile them. Your could then cache your result in your validator and when validating a blueprint you got grab the cached result. Not Ideal, but this is what I would suggest if you want to reduce the number of engine modification here.

Let me know if this doesn’t help you,

Julien.

If you want to extract some logic of it into it’s own functions so that you could call them in your derived class, I wouldn’t be against it.

You could send us a pull request that does that. It won’t be perfect but should reduce the maintenance a bit on your side.

Hi Peter,

I left a little comment on the pull request, but It looks good to me otherwise.

The only thing you will to do is to make sure the pull request is against our branch of main for us to receive in our system. At the moment it’s against you own fork of UE-Main.

Otherwise thank you for the pull request.

Perfect I have the change on our side and grabbed the task for the pull request. I will poke you once the change re-reviewed and submitted on our side.

That’s an interesting idea to minimize the changes to ValidateAssetsInternal. I actually recently added virtual PreValidateAssets and PostValidateAssets functions that our EditorValidatorSubsystem derived class can hook into to gain access to the final asset list, validation settings, and validation context.

The downside to doing the batch compilation before the main loop is that I would need to duplicate all of the asset filtering & loading code, as well as having to run CanValidate on each blueprint a second time. The latter isn’t much of a concern because it’s relatively cheap, but the former is a concerning for the maintenance headache (in case future UE versions change the filtering at the beginning of the main validation loop or to how asset loading is handled). I’ll have to see which is more of a headache, the maintenance of duplicated code or the maintenance of refactoring most of the validation result reporting in the loop.

That’s be great! Being able push back at least some of the changes would help with long term maintenance. I’ll see what I can about getting myself set up to be able to do a PR and get back to you.

Hi [mention removed]​, I’ve finally gotten time to set up a GitHub fork and learn Git well enough to start sending PRs. I actually have a backlog of changes I’d like to send over, and I’ve started with this PR to refactor the DataValidationCommandlet:

https://github.com/peterschusteratms/UnrealEngine\-Main/pull/1

We have a derived commandlet that adds support for -AssetDir and -AssetName filtering arguments (more for local testing than general use), as well as customized filtering of Engine assets because we need to avoid stripping out the content from our Engine\Plugins\ (I’ve had discussions with other devs at Epic about the need to support studio shared plugins outside that directory). I’ve also recently added support for using the commandlet to validate changelists via the commandline, so that we can trigger standard changelist validation from Swarm Reviews.

I’m completely open to how the refactoring of the DataValidationCommandlet is done, so I’m happy to iterate on the specifics in the PR. The main thing is that we need to be able to customize pretty much every aspect of ValidateData(), which in turn required making that an instance function rather than a static function. My preference would be to rename the static function called by the Editor’s “Validate Data” function to something like “RunDataValidation”, but I understand that may be an issue for back-compat for you while supporting other companies. At the moment I’ve left that function name alone and simply created a “ValidateDataImpl” instance function. I went with that name because for our needs it’s fine as a protected function, but it might be more flexible to make it public; we’d just need a better name in that case.

Happy to talk about it here or in the PR.

Cheers,

Peter

Thank you for the review Julien! Where can I find the comment? I’m new to PRs so this is probably user error but I can’t find any comments.

I’ll see if I can figure out how to retarget the PR against your main branch. The GitHub Desktop said it was targeting that but I guess something went wrong. If I can’t figure that out I’ll sent out another PR.

Sure on github, pull requests comment are generally against the full change. You can see then in the modified file section or from the conversation thread.

[Image Removed]

[mention removed]​ do you need to publish that comment? Neither I nor my coworkers can see that:

[Image Removed]

Yes, I did forget to do that sorry.

The change looks good otherwise the last thing you will have to do on your side is to do the a pull request against the ue5-main from the EpicGames/UnrealEngine repository.

It should look like something like this:

[Image Removed]

Once this done can you send me a link to the pull request. This way I will able to grab the internal ticket to actually commit the change on our side.

Ah now I see. I’ve recreated the PR in my main branch against the actual repo: https://github.com/EpicGames/UnrealEngine/pull/13987

wonderful! I did notice one last thing which is that I swapped the ordering of UE_API and static on the new WaitForAssetRegistry() function, compared to the existing ValidateData function. I can either update the PR or let you fix that on your end.