SnapshotPose crashes when ComponentSpaceTransformsArray[read] is empty

I am running into a situation where USkeletalMeshComponent::SnapshotPose is crashing, because ComponentSpaceTransformsArray[CurrentReadComponentTransforms] is empty.

First off, I edited USkeletalMeshComponent::SnapshotPose to prevent the crash, and to just return an invalid snapshot, as it’s already set up to be able to do.

[Image Removed]

Here is the state of the relevant variables during this SnapshotPose, when it was crashing prior to the change I added to ensure there is >0 transforms.

[Image Removed]

But next I am trying to get to the bottom of why the transform array is coming up empty sometimes.

The situation is that the actor is bound to a sequencer, the actor has ragdoll triggered, and this occurs during my get up from ragdoll code, where a snapshot is taken to capture the settled physics enabled state of the mesh, before using it to blend into a get up animation.

I haven’t gotten this with a standalone character with physics enabled ragdoll, as far as I can tell, so I’m wondering if something could be not playing nice between the sequencer and the character mesh.

While I continue to debug it, I wanted to ask if anyone might have any ideas.

What I’ve observed so far.

  • Printing out the state of the ComponentSpaceTransformsArray shows that there is often one of them empty, but the one indexed by CurrentReadComponentTransforms seems to always have transforms
  • USkinnedMeshComponent::FlipEditableSpaceBases seems to be the sole place the render index is altered
  • ParallelAnimationEvaluationTask is null(thinking it might be in the middle of an update)

I’m at a loss so far about how to track this bug down. Any help is appreciated.

Hi, can you show me the code that you’re using to setup the snapshot? And also, where is that code being called from? And it’d also be useful to get a callstack for the crash.

As far as I’m aware, GetComponentSpaceTransforms should always give you back something that is at least valid, except potentially on the first frame after the mesh is registered, since FillComponentSpaceTransforms won’t have been called by that point. GetEditableComponentSpaceTransforms on the otherhand can return an empty array if it’s called during an animation evaluation task, since that buffer is swapped into the anim evaluation context in USkeletalMeshComponent::SwapEvaluationContextBuffers. So, my guess is that’s why you often see one of the arrays being empty.

Do you have a consistent repro for this, or does it just seem to happen randomly at any point? Also is the actor set to a possessable or spawnable in Sequencer?

Thanks for the extra info on this, that helps to narrow things down. What it sounds like is happening here is that the main skeletal mesh component tick function isn’t running in the PrePhysics group, as would usually be expected. Or it’s running long so that it is still in flight by the point that you get to the EndPhysics tick group.

What should happen is that TickComponent on the skeletal mesh component should start in PrePhysics, and that should queue FParallelAnimationEvaluationTask and FParallelAnimationCompletionTask. Those tasks should then be completed within PrePhysics (in an ideal world). Then, in EndPhysics, USkeletalMeshComponent::EndPhysicsTickFunction should run. That should queue ParallelBlendPhysics and CompleteParallelBlendPhysics. It sounds like you’re getting these sets of tasks being interspersed, and that’s what’s causing the issue.

I would double-check that the USkeletalMeshComponent::PrimaryComponentTick is still set to use TG_PrePhysics. More likely is that a tick prerequisite has been set on the mesh (either explicitly or via an attachment to another actor or component) that pushes the TickComponent function into a later tick group. Or it may be that the evaluation tasks are just running long, like I mentioned.

One thing you could do is log out the tick group that USkeletalMeshComponent::ParallelAnimationEvaluation (and the completion task) is running within by calling GetWorld()->TickGroup. The other thing that you could try is to add an explicit dependency to the EndPhysicsTickFunction so that the parallel physics tasks are forced to wait for the parallel animation tasks to complete. You could add that in USkeletalMeshComponent::RegisterEndPhysicsTick (if you look in USkeletalMeshComponent::RegisterClothTick you’ll see how we do that for the cloth tick function).

>I assume this line is meant to prevent a new frame from starting before these jobs complete

ThisTickFunction.GetCompletionHandle()->DontCompleteUntil(ParallelBlendPhysicsCompletionTask);

but apparently it’s not doing the job?

This just specifies that FParallelBlendPhysicsCompletionTask should run before EndPhysicsTickFunction is marked as complete, but since there’s no explicit dependency on PrimaryComponentTick the parallel anim eval tasks could potentially still run before or after this.

I wonder if you’re running into a bug with the following code in USkeletalMeshComponent::TickComponent:

		/** Update the end group and tick priority */
		const bool bDoLateEnd = CVarAnimationDelaysEndGroup.GetValueOnGameThread() > 0;
		const bool bRequiresPhysics = EndPhysicsTickFunction.IsTickFunctionRegistered();
		const ETickingGroup EndTickGroup = bDoLateEnd && !bRequiresPhysics ? TG_PostPhysics : TG_PrePhysics;
		if (ThisTickFunction)
		{
			ThisTickFunction->EndTickGroup = EndTickGroup;
 
			const bool bDoHiPri = CVarHiPriSkinnedMeshesTicks.GetValueOnGameThread() > 0;
			check(PrimaryComponentTick.bHighPriority == bDoHiPri)
		}

If the logic there were to incorrectly set EndTickGroup to TG_PostPhysics (even though your mesh is using physics) then it would be possible for the parallel anim eval tasks to still be in flight when the parallel blend physics tasks are dispatched. Either that, or there must be something that’s dragging the primary tick function into a later tick group somehow.

In terms of the fix/workaround, it does seem wrong to me that EndPhysicsTickFunction doesn’t have a prerequisite on PrimaryComponentTick. I’ll need to confirm that with the physics team, but I think you should be good to run with that change.

It’s a possessable actor in the sequencer, assigned during play. This is part of a sync attack interaction between the player and an AI.

It’s being called from the game thread. Callstack attached. It’s part of the ragdoll recovery code.

Here is the utility function on the native anim instance that is calling the snapshot function.

[Image Removed]

I also added this to my character tick to try and find frames of empty component space(read) transforms.

[Image Removed]

And sure enough, it spews out 38 frames of this empty transform log, prior to the call into the pose snapshot where it fails. It continues to spam this afterward, endlessly. So whatever is breaking appears to be permanently broken going forward.

For some more context, here is what we are working with, and what I’ve learned from debugging so far

  • The ACharacter::Mesh is the animated skeletal mesh running the authoritative ABP. It has valid component arrays
  • The AI are running BodyMesh SKMCs that use CopyPoseFromMesh or RetargetPoseFromMesh, in ParentSkeletalMeshComponent mode.
  • The issue occurs in both situations.

In debugging in FAnimNode_CopyPoseFromMesh, I noticed that in FAnimNode_CopyPoseFromMesh::PreUpdate,

  • const TArray<FTransform>& CachedComponentSpaceTransforms = CurrentMeshComponent->GetCachedComponentSpaceTransforms(); (not sure if this is an issue)
  • ComponentSpaceTransformsArray is not empty(both are populated)
  • Despite this, SourceMeshTransformArray is populated, because it pulls from CurrentMeshComponent->GetComponentSpaceTransforms()
  • FAnimNode_CopyPoseFromMesh::Evaluate_AnyThread isn’t being called once the state is broken, only the PreUpdate. Prior to breaking, Evaluate_AnyThread does get called, when things are working properly.
  • Once busted, USkeletalMeshComponent::RefreshBoneTransforms early outs due to GetNumComponentSpaceTransforms() == 0, which I think is why the Evaluate_AnyThread stops getting called(animation job doesn’t get dispatched anymore)
  • In debugging USkinnedMeshComponent::FlipEditableSpaceBases, where the read/write indices are updated, I can see that the CurrentReadComponentTransforms is being updated to index an empty ComponentSpaceTransformsArray
  • [Image Removed]
  • Based on this callstack, originating from USkeletalMeshComponent::CompleteParallelBlendPhysics, if I save copies of the 2 arrays being swapped by SwapEvaluationContextBuffers, I can see what when it breaks, it’s because the AnimEvaluationContext.ComponentSpaceTransforms is empty, and is being swapped indiscriminately into the editable index of ComponentSpaceTransformsArray, which then becomes the read inside the FinalizeAnimationUpdate, called from that same function.
  • I see in USkeletalMeshComponent::BlendInPhysicsInternal, if bParallelBlend is enabled, it creates a task FParallelBlendPhysicsTask, which calls USkeletalMeshComponent::PerformBlendPhysicsBones. It also makes this task a dependency of the FParallelBlendPhysicsCompletionTask, which is where the bad swap is occuring. So it stands to reason whatever is delivering a bogus AnimEvaluationContext.ComponentSpaceTransforms should be coming from this task, since it does this
  • [Image Removed]
  • Interestingly, I don’t get any empty arrays passed into PerformBlendPhysicsBones.
  • If I disable, a.ParallelBlendPhysics, the problem doesn’t occur. So it would appear something is stomping the AnimEvaluationContext.ComponentSpaceTransforms, and the parallelism brought on by a.ParallelBlendPhysics leaves a window for that to occur.
  • It’s not super clear how to debug it from here. Hoping yall might have an idea how to proceed. I don’t know how you keep track of what is going on with the async stuff in this system, considering that USkeletalMeshComponent::SwapEvaluationContextBuffers is called at the beginning and end of a frame, and sometimes in the midst of a frame update, such as in BlendInPhysicsInternal, where it’s called prior to dispatching the async physics task. Especially when SwapEvaluationContextBuffers is swaps the entire transform arrays between the animation context and the editable transform list, and that one of these arrays is routinely empty.

Hey [mention removed]​

I think I may have found a culprit.

I’ve instrumented the SwapEvaluationContextBuffers function with an integer value to assist in logging the flow of calls through this function, since it does the job of preparing the animation context prior to async operations, and also swapping the results back into the editable transform arrays. This pattern is quite dangerous if you ask me, for reasons I think I am about to show.

Here is a legend first off, to make sense of the logging

1 - DispatchParallelEvaluationTasks

3 - just prior to kicking off FParallelBlendPhysicsTask task

11 - CompleteParallelBlendPhysics

6 - DoParallelEvaluationTasks_OnGameThread(first)

10 - DoParallelEvaluationTasks_OnGameThread(second)

10 - also in CompleteParallelAnimationEvaluation

Normally the flow during physics blending is

1->10->3->11 then repeats

However, when things break, we can see why in the log

[Image Removed]These 3->1 transitions are erroneous, and represent a new DispatchParallelEvaluationTasks firing off, while the physics blend tasks are still in flight. These invalid 3->1 transitions precede every log from CompleteParallelBlendPhysics reporting an unexpectedly empty AnimEvaluationContext.ComponentSpaceTransforms, when it expects to be swapping valid AnimEvaluationContext.ComponentSpaceTransforms back into the editable transform buffer.

We can also see from the log that the physics task completes

I assume this line is meant to prevent a new frame from starting before these jobs complete

ThisTickFunction.GetCompletionHandle()->DontCompleteUntil(ParallelBlendPhysicsCompletionTask);

but apparently it’s not doing the job?

To verify this premature DispatchParallelEvaluationTasks is indeed coming from the tick, I put some conditional logging to catch the improper 3->1 transition, and sure enough

[Image Removed]

Any ideas for a proper fix here would be appreciated. Obviously I can disable parallel physics update as a temporary measure but I’m hoping to avoid that.

Thanks for the feedback. I have a deadline this week, but I will investigate the leads you mentioned probably next week and I will get back to you on my findings. At a glance, I know that the body mesh(the SKMC the ragdoll is running off of), uses AddTickPrerequisiteComponent with the GetMesh() from ACharacter. But they both use TG_PrePhysics

This is the only usage of AddTickPrerequisiteComponent at the project level. There’s also no reference to any of the tick group enums in project code outside of this, so if it’s being set it’s part of the implementation of something else, but I will debug it at the call sites of the tasks to ensure something it is correct when the issues occur.

[Image Removed]

The other thing that you could try is to add an explicit dependency to the EndPhysicsTickFunction so that the parallel physics tasks are forced to wait for the parallel animation tasks to complete. You could add that in USkeletalMeshComponent::RegisterEndPhysicsTick (if you look in USkeletalMeshComponent::RegisterClothTick you’ll see how we do that for the cloth tick function).

Are you saying basically this?

[Image Removed]That because without this dependency, EndPhysicsTickFunction isn’t restricted to run 1:1 with the PrimaryComponentTick?

First off, nothing ever seems to be anything other than TG_PrePhysics, so I don’t think any tick groups are being changed from what I can tell.

However, adding the highlighted change suggested above seems to be an effective fix. Just to make sure, I commented it out, and it didn’t take long to run into the bug again, so this seems legit.

What is baffling to me is why this isn’t a bigger issue for more users. I don’t think we’re doing anything particularly novel here. Any ideas about that?

Ragdoll isn’t uncommon in unreal games, and with a.ParallelBlendPhysics defaulting to enabled, I would think there’d be more reports of this sort of thing.