Excessive Anim Init Time - Very large iteration of cached pose hierarchies

Hello,

We are experiencing very large spikes when initialising a specific animation graph. This graph makes heavy use of saved cached poses. The animators have run a bit wild and used them in multiple places within the graph and in some state machines too.

The main issue we are seeing is an excessively deep traversal of the animation graph during the setup and initialisation phase. You can see this in the trace below.

[Image Removed]

I added some trace arguments to all the animation nodes to see what was happening and it showed that there was some very deep recursion stemming from FAnimNode_Root_Initialize. Save cached poses are being initialised many many times.

That can be seen here..

[Image Removed]There is an insane amount of saved cache poses.. and you can clearly see the recursion

I added some temp code to limit the number of calls to both the initialize / CacheBone AnyThread functions for the SaveCachedPose and did a re-trace.

Here I only allow a one time init and cache, until at least one update is done, you can see both the code snippet of the change and the insights snippet below. This prevents the recursion but causes issues with some poses after the first init.

The problems is almost certainly our graph.. regardless I need to find the appropriate and safe way to prevent this kind of recursive behaviour.

  1. Are there any current mechanisms to prevent this kind of recursion or better debug it?
  2. Should we be adding code similar to the snippet below into the animation system to limit this kind of recursion?
  3. Is there a way to nicely warn at edit time when an animator creates these kind of recursive loops?

Thanks for your time

-Chris

[Image Removed]

`void FAnimNode_SaveCachedPose::Initialize_AnyThread(const FAnimationInitializeContext& Context)
{
// #BULKHEAD_CHANGE(chris.eaves): [02/06/25] - Temp test code
if(bInitOnce)
{
return;
}
// #BULKHEAD_CHANGE(chris.eaves): [02/06/25] - Temp test code

// StateMachines cause reinitialization on state changes.
// we only want to let them through if we’re not relevant as to not create a pop.
if (!InitializationCounter.IsSynchronized_Counter(Context.AnimInstanceProxy->GetInitializationCounter())
|| (UpdateCounter.HasEverBeenUpdated() && !UpdateCounter.WasSynchronizedCounter(Context.AnimInstanceProxy->GetUpdateCounter())))
{
InitializationCounter.SynchronizeWith(Context.AnimInstanceProxy->GetInitializationCounter());

FAnimNode_Base::Initialize_AnyThread(Context);

// Initialize the subgraph
Pose.Initialize(Context);

// #BULKHEAD_CHANGE(chris.eaves): [02/06/25] - Temp test code
bInitOnce = true;
// #BULKHEAD_CHANGE(chris.eaves): [02/06/25] - End

}
}

void FAnimNode_SaveCachedPose::CacheBones_AnyThread(const FAnimationCacheBonesContext& Context)
{
// #BULKHEAD_CHANGE(chris.eaves): [02/06/25] - Temp test code
if(bInitOnce)
{
return;
}
// #BULKHEAD_CHANGE(chris.eaves): [02/06/25] - End


}

void FAnimNode_SaveCachedPose::Update_AnyThread(const FAnimationUpdateContext& Context)
{
// #BULKHEAD_CHANGE(chris.eaves): [02/06/25] - Temp test code
bInitOnce = false;
// #BULKHEAD_CHANGE(chris.eaves): [02/06/25] - End

…`

Steps to Reproduce
Almost certainly this is specific to our animation graph..

The graph uses a LOT of cached poses, sometimes reusing the same one in multiple places in the graph

Bit more info on this but the salient points that have been missed are:

we are making use of

UAnimationSettings::bTickAnimationOnSkeletalMeshInit = Falseand linked animation layers

we are close to having an engine fix and will post some more details here soon but we have found an issue with the synchronisation counters.

Hi, sorry for the delay in getting back to you on this. We’ve had a number of people out at UE Fest and are just catching up now.

I’ll need to talk with the dev team about this one to get more information. But as far as I’m aware, the synchronisation counters are intended to prevent the kind of behaviour that you’re seeing with multiple reinitializations of the saved pose happening within a given frame.

It would be interesting to get more information on why those checks are failing. As far as I can see, FAnimNode_SaveCachedPose::UpdateCounter is never actually updated, so it seems like the following check must be failing to prevent reinitialization:

!InitializationCounter.IsSynchronized_Counter(Context.AnimInstanceProxy->GetInitializationCounter())Are you able to debug this to look more closely at what’s happening here? FAnimNode_SaveCachedPose::InitializationCounter is only modified from within that if statement in FAnimNode_SaveCachedPose::Initialize_AnyThread, so I’m assuming that the problem is with the FAnimInstanceProxy::InitializationCounter changing within the frame.

You also mentioned that you’re using linked animation layers. Do these cached pose nodes exist within the linked graphs or only in the main graph?

The change that you’ve made looks reasonable since you’re just preventing reinitialization within the frame, but I’ll double check this with the dev team as well.

Hello Euan,

We have resolved this by now, and it is indeed the `InitializationCounter` that fails to prevent AnimNodes from initializing over and over again.

The reason why it looks so “infinite loop”-like is due to the SaveCachedPose AnimNode being used very early in the AnimGraph.

Combined with AnimNodes like “LayeredBlendPerBone”, which use the same cached pose a second time, with, e.g., just a MontageSlot in front of it, will cause the SavedCachedPose AnimNode to initialize multiple times, with just a slight offset in the trace (creating the visual pattern).

So now the question of “Why does the Counter not stop this?” comes up.

First of all, I have to mention that the trace posted by Chris is of an AnimGraph (or AnimInstance) that is only used as a LinkedLayer, and a LinkedLayer will use the InitializationCounter of its outer AnimGraph to sync itself with.

So why is the outer InitializationCounter invalid? Because, in our case, the outer AnimGraph hasn’t actually initialized yet when the LinkedLayer gets applied.

I can’t say if these are the correct repro steps, but this is roughly how our setup works:

  1. We have a main ThirdPerson Character AnimInstance with a LinkedLayerSlot for a ThirdPerson Weapon AnimInstance.
  2. The ThirdPerson Weapon AnimInstance has lots of SaveCachedPose AnimNodes that are used with AnimNodes that initialize multiple “paths” (like the LayeredBlendPerBone one).
  3. We set `bTickAnimationOnSkeletalMeshInit` to false within our DefaultEngine.ini, to allow our AnimGraphs from initializing deferred.
  4. We apply the ThirdPerson Weapon AnimInstance as a linked layer through an OnRep.

Due to bTickAnimationOnSkeletalMeshInit being false, the init of the main AnimGraph will be deferred and will only happen once the AnimationProxy calls Update on a worker thread.

We call `USkeletalMeshComponent::LinkAnimClassLayers` when the weapon item replicates and triggers an OnRep, which will lead to `UAnimInstance::LinkAnimClassLayers` and finally to `UAnimInstance::PerformLinkedLayerOverlayOperation`. And while `PerformLinkedLayerOverlayOperation` does have a parameter called `bInDeferSubGraphInitialization`, it defaults to false in this case.

So now we have the linked AnimGraph going through the initialization flow without the outer/main AnimGraph having done the same.

Since there is no call to increment the counter in that code-path, it will result in the expensive initialize call where the SavedCachePose node will trigger over and over again, for every single time it is used in the graph (or at least every time it is encountered when walking through the AnimGraph on initialization).

Since circular dependencies between SavedCachedPose AnimNodes are blocked (so you can’t have cached pose A reference cached pose B and the other way round), this will still eventually finish, but depending on how many AnimNodes sit behind a given SavedCachedPose AnimNode and depending on how often that SavedCachedPose AnimNode is used, this can take ages to initialize.

How did we resolve this?

So, first things first, I’m not that used to AnimGraph code, so this could be the completely wrong approach, just keep that in mind.

The first thing I did was copy the `bTickAnimationNow` boolean, which can be found in `USkeletalMeshComponent::InitAnim`, to two call places of `PerformLinkedLayerOverlayOperation` within `UAnimInstance` and use the boolean for the otherwise defaulted param.

Those two places are `UAnimInstance::LinkAnimClassLayers` and `UAnimInstance::UnlinkAnimClassLayers`. The third one is the standard call coming from `UAnimInstance::InitializeGroupedLayers`, which already uses the boolean.

`const bool bTickAnimationNow = (((GetWorld()->WorldType == EWorldType::Editor) && !SkelMeshComp->bForceRefpose)
|| UAnimationSettings::Get()->bTickAnimationOnSkeletalMeshInit)
&& !SkelMeshComp->bUseRefPoseOnInitAnim;

PerformLinkedLayerOverlayOperation(InClass, SelectResolvedClassIfValid, !bTickAnimationNow);`I did adjust this a bit more because our Dedicated Server doesn’t tick AnimInstances, which means I always set that boolean to true for it, but that’s irrelevant to the problem.

This already solves the majority of the issue, because now the linked AnimGraph is also initializing deferred and will be taken care of alongside the main AnimGraph in `FAnimInstanceProxy::UpdateAnimation_WithRoot`.

This does, however, cause a problem if the main AnimGraph is already initialized, so I also added a boolean called `bDeferSubGraphInitialization` to the `UAnimInstance`, which I set to true inside `PerformLinkedLayerOverlayOperation` when `bInDeferSubGraphInitialization` is also true.

I then use this inside `FAnimInstanceProxy::UpdateAnimation_WithRoot`, to initialize just the linked layers. A small improvement was made by adding the same boolean to `FAnimInstanceProxy` and also setting it to true, so that we aren’t looping over the linked AnimInstances every time for no reason.

That now allows linked AnimGraphs to initialize deferred “later”, such as when changing weapons during gameplay.

Anything else?

Yes, we noticed that the order in which `FAnimInstanceProxy::InitializeRootNode_WithRoot`, `FAnimInstanceProxy::CacheBones`, and the initialization and caching of bones of sub graphs is handled causes a similar expensive call to `FAnimNode_LinkedAnimGraph::CacheBonesSubGraph_AnyThread`.

`CacheBones` is called after the linked AnimGraphs are looped, and `CacheBones` is the function that increments the `CachedBonesCounter`, so the linked AnimGraphs can still end up caching bones with an invalid outer/main AnimGraph `CachedBonesCounter`. We ended up moving the `CacheBones` call upwards, between `UpdateAnimation_WithRoot` and the linked AnimGraph loop. This doesn’t seem to have broken anything and resolves the additional expensive call to `CacheBonesSubGraph_AnyThread`.

There is one additional change that I added. Inside `PerformLinkedLayerOverlayOperation`, the following code can be found:

UAnimInstance* LinkedInstance = SharedLinkedAnimLayers->AddLinkedFunction(this, ClassToSet, FunctionToLink, bIsNewInstance);And while a bit further down, I handle setting the `bDeferSubGraphInitialization` boolean to true for the `LinkedInstance`, the `FAnimSubsystem_SharedLinkedAnimLayers::AddLinkedFunction` results in the Initialize flow being called instantly.

Inside `FLinkedAnimLayerClassData::FindOrAddInstanceForLinking`, it calls `NewAnimInstance->InitializeAnimation()`, which doesn’t pass any bDefer boolean in.

So my change added another parameter to `AddLinkedFunction` and `FindOrAddInstanceForLinking` called `bInDeferSubGraphInitialization`, which I use to pass the same parameter of `PerformLinkedLayerOverlayOperation` along. And inside `FindOrAddInstanceForLinking`, I then pass this into `InitializeAnimation` and set the `bDeferSubGraphInitialization` boolean of the `NewAnimInstance` to that same param. This then handles deferring that last puzzle piece.

--

And that’s kinda it. Feel free to provide a “proper” solution in case there are hidden problems with ours.

Cheers

Cedric

Hi, thanks for following up with the detailed info about your repro and the fixes that you’ve made. I managed to get a simple repro based on that.

There are multiple ways that this issue could potentially be resolved. The first is the approach that you’ve taken, where you’re effectively deferring the initialisation of the linked graph until the main graph is initialised. That’s a sensible approach to take, and if it works for you then I don’t see any obvious issues with it. It is relatively higher risk, though, since the changes are quite invasive. So it’s always possible there’s a workflow that we’re not aware of that’s relying on that initialisation happening during layer linking. But I think you should be ok, and you can always revert the changes if you do run into problems in future.

Because of the risk, we wouldn’t want to integrate that kind of change into the engine. As you’ve seen, all of the code around initialising the linked graphs and caching the bones is fragile, so the risk of there being some edge case that we’re not aware of that breaks is too high. Iinstead, if we go for a fix, it would need to catch the specific case that you’ve run into - that the traversal counters aren’t set up correctly for the sub-graph.

I’ve been looking at making changes to catch this and force the counters on the linked graph to be initialized. I’ve attached a patch with some in-progress changes in case you’re curious or if you want to test these changes. When I run these changes, with the simple repro that I have, it stops Initialize_AnyThread and CacheBones_AnyThread on the SaveCachedPose node from recursing any more than once into the sub-graph.

Another possible solution to this would be to just defer the call to link the anim layer. Depending on where you’re doing the linking from, this could mean waiting for another frame, or adding a tick dependency to make sure that the skeletal mesh has ticked (and the anim instance has initialized) before the linking operation happens. You have a solution, so this likely doesn’t make sense for you, but I wanted to add it for completeness and in case someone else runs into the same issue and finds this thread.

Forgot to attach the patch so adding it now

Sure, I have a shelf in Release-5.5 that I think you should be able to view - 43555226. Let me know if you can’t see it.

Thank you for looking into this issue. Are you able to share the code changes with us more directly as raw text?