Editor crash post loading control rig

Hello,

We are experiencing this random crash in the editor during PIE.

It looks like the issue comes from the fact that the same URigVMCompiler object is compiling two control rig blueprints in the same callstack:

1) The first control rig A is referencing another one B, used as function library, and the function FRigVMGraphFunctionHeader::GetFunctionHost tries to load B

2) FlushingAsyncLoading process pending loaded packages that include another control rig C

3) The URigVMBlueprint::HandlePackageDone function triggers another recompile on C

4) URigVMCompiler::CurrentCompilationFunction seems to point to a function of A

Are these recursive invocations supported by the URigVMCompiler? We cannot upgrade engine version atm and we tried to create a new URigVMCompiler for every invokation as a potential fix, do you see any issue in doing that?

Thank you

Hi, can you give me more info on how you’re loading the control rigs when you run into the crash (ie. is it a soft object reference on the first control rig that’s being loaded that triggers the issue)? Or is there any chance that you could attach the assets that repro the issue? I’ve been trying to get a repro of this to debug it but so far I haven’t had any luck.

I’ve also spent a bit of time looking through the differences in the callstack between 5.7 and 5.4 but it doesn’t seem like anything obvious has changed since then, so it’s possible that we still have the same issue that you’re running into.

In terms of whether it’s intentional that the rig vm compiler can be called recursively, from looking at the code that does seem to be the case since FRigVMGraphFunctionHeader::GetFunctionHost will explicitly try to load the other rig and we compile as part of that process. But I’m going to double-check this with the dev team.

Hi,

We async load an AActor blueprint class that has 5 skeletal mesh components with distinct animation blueprints, each having a Control Rig node pointing to a distinct control rig . 4 of the 5 control rigs reference another control rig as function library.

About the recursion, I want to stress the fact that the URigVMBlueprint::HandlePackageDone is not called on the rig vm that is loaded by FRigVMGraphFunctionHeader::GetFunctionHost but another one that was in the async loading queue.

I attached the callstack in the original post but I don’t see it anymore, could you confirm you have received it?

Thank you

> I attached the callstack in the original post but I don’t see it anymore, could you confirm you have received it?

Yeah, I have it. The front end UI for EPS doesn’t seem to display it but I’m able to view it from the backend.

> About the recursion, I want to stress the fact that the URigVMBlueprint::HandlePackageDone is not called on the rig vm that is loaded by FRigVMGraphFunctionHeader::GetFunctionHost but another one that was in the async loading queue.

I think that this is fine, in theory. HandlePackageDone will be called every time a package has finished loading. The LoadedPackages array can contain multiple assets, since multiple assets could have been loaded, including potentially more than one rig. So it doesn’t necessarily need to be the same rig that FRigVMGraphFunctionHeader::GetFunctionHost was called on (whether URigVMCompiler can cope with this is a different question). You could add another check to HandlePackageDone in the form of:

	if (!GetPackage()->GetHasBeenEndLoaded())
	{
		return;
	}

But if the current rig is within the LoadedPackages array, I don’t see how the package could not have been loaded by that point, so I don’t expect that would fix anything.

I think we really need to try and get a repro for this. When I was trying again today, I see that the async loader will automatically load the dependent assets when I attempt to perform an async load on an actor similar to the one you described. So I’m not sure how you end up in a situation where FRigVMGraphFunctionHeader::GetFunctionHost has to force the load.

Maybe it’s that the async loading thread is busy loading other assets, so the dependent rig hasn’t been loaded yet. But in a simpler setup like mine where I have fewer assets to load, the async loader has loaded all the assets by the point we call FRigVMGraphFunctionHeader::GetFunctionHost. If that’s the case, I would expect that you only run into this crash intermittently, or is it something you see consistently with every PIE?

Is there any chance you could supply the assets in a repro project? If not, could you attach a few screenshots to show how you’re performing the async load of the actor, and also how you’re referencing the functions within the control rig(s)?

Hi, just following up on this since I’ve been looking at some code that you could try as a workaround. This is speculative since I can’t repro it, but can you test changing URigVMBlueprint::HandlePackageDone to the following:

void URigVMBlueprint::HandlePackageDone(const FEndLoadPackageContext& Context)
{
	if (!Context.LoadedPackages.Contains(GetPackage()))
	{
		return;
	}
 
	struct FNodeGraphTraverser
	{
		TSet<URigVMFunctionReferenceNode*> VisitedRefNodes;
 
		bool LoadDependentRigBlueprints(URigVMBlueprint* Blueprint)
		{
			TArray<URigVMGraph*> Graphs = Blueprint->RigVMClient.GetAllModels(true, true);
			for (URigVMGraph* Graph : Graphs)
			{
				TArray<URigVMNode*> Nodes = Graph->GetNodes();
				for (int32 i = 0; i < Nodes.Num(); ++i)
				{
					URigVMFunctionReferenceNode* ReferenceNode = Cast<URigVMFunctionReferenceNode>(Nodes[i]);
					if (ReferenceNode && ReferenceNode->GetReferencedFunctionHeader().IsValid() && !VisitedRefNodes.Contains(ReferenceNode))
					{
						VisitedRefNodes.Add(ReferenceNode);
 
						if (URigVMBlueprintGeneratedClass* HostClass = Cast<URigVMBlueprintGeneratedClass>(ReferenceNode->GetReferencedFunctionHeader().GetFunctionHost()))
						{
							if (URigVMLibraryNode* LibraryNode = ReferenceNode->LoadReferencedNode())
							{
								if (URigVMBlueprint* FunctionBlueprint = LibraryNode->GetTypedOuter<URigVMBlueprint>())
								{
									if (!LoadDependentRigBlueprints(FunctionBlueprint))
									{
										return false;
									}
								}
							}
 
							if (!HostClass->GetPackage()->GetHasBeenEndLoaded())
							{
								return false;
							}
						}
 
					}
				}
			}
 
			return true;
		}
	};
 
	FNodeGraphTraverser GraphTraverser;
	if (GraphTraverser.LoadDependentRigBlueprints(this))
	{
		HandlePackageDone();
	}	
}

The idea with that change is to traverse the control rig graph looking for function nodes that reference other graphs. When we find one, we attempt to load the sub-graph if it hasn’t already been loaded by that point. So in theory, any compilation should be done individually at that point rather than recursively within RecompileVM.

I haven’t done much testing on this, so it’s not something to commit at the moment, but I’d be interested to know whether it resolves the crash for you or not.

In terms of tracking down/repro’ing the bug, I’d also be interested to see what your reference viewer looks like for the different control rigs (along with the other info I asked for previously). The reason that I can’t get to a point where the sub-graphs are compiled from RecompileVM is that they’re in the package exports for the main control rig I’m testing with. That’s why they’re loaded up front in all my test cases.

If the references between the control rigs aren’t shown in the reference viewer, that would show they probably aren’t in the package exports so aren’t loaded up front. The example below is the what I’d expect to see in the reference viewer (in this case I have a main graph that references a function in a sub-graph which references a different function in another sub-graph - you can also click the eye button to show hard/soft references):

[Image Removed]

> The crash rarely happens but when it shows up for a developer it usually happens to multiple developers as well and it goes away after some launches of PIE. I suspect it is related to other assets change that requires ddc update.

Yeah, I’m not surprised it’s difficult to repro since it’s related to async loading. I expect sometimes the assets have been loaded in time, other times not. It’s interesting if it’s linked with a DDC update, though.

> I’m attaching a part of the reference viewer. These nodes shown are directly connected to the actor blueprint class and there are more similar, all pointing to the CR_FunctionLibrary you see

A few questions on this:

  • Are the anim blueprints that are referencing the top level control rig linked anim graphs/layers?
  • Does CR_FunctionLibrary have a hard reference to SplineFunctionLibrary anywhere? I can see in the screenshot that the main control rig has a soft reference to it, but not whether CR_FunctionLibrary has a hard reference
  • Related to that, can you show me how you’re referencing/using the functions in SplineFunctionLibrary from the main control rig, as well as CR_FunctionLibrary?

Also, a quick follow-up on the code that I sent over. I was looking at it again today, and there’s a small change I’d recommend making. At the top of the function, this code:

	if (!Context.LoadedPackages.Contains(GetPackage()))
	{
		return;
	}

Should be replaced with this:

	if (!GetPackage()->GetHasBeenEndLoaded())
	{
		return;
	}

That should make it less likely that we’ll never call through to the other implementation of HandlePackageDone (the one without arguments) on this control rig.

Hello,

Have you tried applying the attempted fix that [mention removed]​ suggested?

Lot’s of things have changed between 5.4 and 5.7, so it is hard to give you a specific fix for this issue, especially without a repro case.

If Euan’s fix unblocks you, that is great.

Please let us know, thanks.

Sara

Great! Thank you [mention removed]​.

Hello,

unfortunately I am not able to get a consistent repro myself. The crash rarely happens but when it shows up for a developer it usually happens to multiple developers as well and it goes away after some launches of PIE. I suspect it is related to other assets change that requires ddc update.

The async loading is performed using the actor blueprint class soft object path passed to UAssetManager::GetStreamableManager().RequestAsyncLoad.

I’m attaching a part of the reference viewer. These nodes shown are directly connected to the actor blueprint class and there are more similar, all pointing to the CR_FunctionLibrary you see

[Image Removed]

Thanks a lot

Hello,

Thanks a lot for the code, I will try it whenever the crash shows up.

I attached a part of the reference viewer in my previous answer

Thank you for your time

Apologies for the late reply, we are close to a deadline.

Are the anim blueprints that are referencing the top level control rig linked anim graphs/layers?

Yes, some of them are plain anim blueprints but others are anim layers

Does CR_FunctionLibrary have a hard reference to SplineFunctionLibrary anywhere? I can see in the screenshot that the main control rig has a soft reference to it, but not whether CR_FunctionLibrary has a hard reference

Yes, the CR_FunctionLibrary has an hard reference to SplineFunctionLibrary

Related to that, can you show me how you’re referencing/using the functions in SplineFunctionLibrary from the main control rig, as well as CR_FunctionLibrary?

Sure,

This one from CR_FunctionLibrary:

[Image Removed]I cannot find how the main control rigs are using the SplineFunctionLibrary despite the soft references, only usage of the CR_FunctionLibrary

[Image Removed]

Thanks for your time

Hello,

I can’t say it fixed the issue since the it didn’t happen again since I posted here. We can close this

Thank you