Race condition in FSkeletonRemapping usage

anonymous-edc · October 24, 2025, 9:11pm

To catch this race condition, I added an `MTAccessDetector` to `FSkeletonRemapping` to mark up all its reads and writes, then did a cold DDC cook to force DDC data to be regenerated.

This caught the following scenario:

The Game Thread was:

Serializing a Skeletal Mesh
Rebuilding Ref Skeleton from our SK_CharacterTemplate_Template_Skeleton
Calling UE::Anim::FSkeletonRemappingRegistry::Get().RefreshMappings(Skeleton)
- ExistingMapping = SK_CharacterTemplate_Template_Skeleton -> SK_Mannequin
Calling ExistingMapping->RegenerateMapping()
!! Write: ExistingMapping.RetargetingTable.Reset()
!! Write: ExistingMapping.GenerateMapping()

While a task thread was:

Calling UAnimSequence::GetAnimationPose() on SK_Mannequin
Calling UE::Anim::Decompression::DecompressPose()
DecompContext:
- SourceSkeleton = SK_CharacterTemplate_Template_Skeleton
- SkeletonRemapping = SK_CharacterTemplate_Template_Skeleton -> SK_Mannequin
  - (same object being used on game thread)
!! Read: SkeletonRemapping.GetTargetSkeletonBoneIndex()
!! Read: SkeletonRemapping.RetargetBoneRotationToTargetSkeleton()

Nothing prevents the game thread from regenerating the skeleton remapping while the task thread is trying to read from it.

anonymous-edc · October 24, 2025, 9:12pm

Steps to Reproduce

Euan_Carmichael · October 27, 2025, 5:30pm

Hi, thanks for reporting this. I took a look in the 5.6 codebase, and I see that there should already be locks around the access of the skeleton remapping data from RefreshMappings. Does the implementation of the function in your version of the engine look like this?

void FSkeletonRemappingRegistry::RefreshMappings(const USkeleton* InSkeleton)
{
	TArray<TSharedPtr<FSkeletonRemapping>> ExistingMappings;
	{
		UE::TReadScopeLock ReadLock(MappingsLock);
		PerSkeletonMappings.MultiFind(InSkeleton, ExistingMappings);
	}
	{
		UE::TWriteScopeLock WriteLock(MappingsLock);
		for(const TSharedPtr<FSkeletonRemapping>& ExistingMapping : ExistingMappings)
		{
			ExistingMapping->RegenerateMapping();
		}
	}
}

I think that this should probably be covered entirely by a single write lock rather than read/write, but it doesn’t sound like that’s the same issue you found.

Euan_Carmichael · November 7, 2025, 11:17pm

Great, thanks for letting me know. I’m going to go ahead and get these changes integrated. They’ll go into the 5.8 release. I’m going to close out this thread, but if we happen to run into any problems with the change once it goes into Fortnite, I’ll reopen it and let you know.

Euan_Carmichael · November 10, 2025, 7:11pm

Hi, just reopening this since I did find a potential problem with the code that I sent over previously. Turns out that we need to null check the Skeleton as the cloth tools can call RebuildRefSkeleton with a null skeleton. So the code change I sent you previously should actually look like this:

	if (Skeleton && (this == &Skeleton->GetReferenceSkeleton()))
	{
		// Full rebuild of all compatible with this and with ones we are compatible with.
		UE::Anim::FSkeletonRemappingRegistry::Get().RefreshMappings(Skeleton);
	}

Euan_Carmichael · November 26, 2025, 12:44am

No problem, thanks again for flagging it!

anonymous-edc · October 27, 2025, 9:48pm

Yes it looks like that. The race is not in constructing or handing out a FSkeletonRemapping, it’s that _after_ that nothing is stopping two competing threads from writing to the same mapping while another reads from it.

We have a temporary fix for this on our end, but it’s pretty extensive. Everywhere that calls `UE::Anim::FSkeletonRemappingRegistry::Get().GetRemapping()` now has to take a scope lock depending on whether they plan on reading or writing.

Examples:

const FSkeletonRemapping& SkeletonRemapping = UE::Anim::FSkeletonRemappingRegistry::Get().GetRemapping(BoneIndexSkeleton, LastUpdateSkeleton);
UE::TReadScopeLock SkeletonRemappingReadLock = SkeletonRemapping.LockForReading();

UE::TWriteScopeLock WriteLock(MappingsLock);
for(const TSharedPtr<FSkeletonRemapping>& ExistingMapping : ExistingMappings)
{
    UE::TWriteScopeLock ExistingMappingWriteLock = ExistingMapping->LockForWriting();
    ExistingMapping->RegenerateMapping();
}

Euan_Carmichael · October 28, 2025, 6:29pm

Thanks, yeah I see what you mean. The option that you mentioned of locking whenever we get the remapping makes sense. As you say though, it means quite a few changes since that functionality is used extensively throughout the animation code wherever we get a bone index.

I think there is another option that we can look at. You mentioned originally that this happened when rebuilding the ref skeleton while the mesh was being loaded. But the actual skeleton (rather than the mesh’s ref skeleton) shouldn’t be affected by that. (If the actual skeleton was affected, we would have many more issues decompressing the animation against a skeleton that had changed.)

Looking at the code, it seems that FReferenceSkeleton::RebuildRefSkeleton is being overzealous about when it regenerates the skeleton remapping. We shouldn’t need to update it when just the ref skeleton on the mesh has been updated. So we could make the following change to FReferenceSkeleton::RebuildRefSkeleton:

				RequiredVirtualBones.Add(NewBoneIndex);
				UsedVirtualBoneData.Add(FVirtualBoneRefData(NewBoneIndex, SourceIndex, TargetIndex));
			}
		}
	}
 
        // new branch to only refresh mapping if the skeleton asset has changed
	if (this == &Skeleton->GetReferenceSkeleton())
	{
	    // Full rebuild of all compatible with this and with ones we are compatible with.
	    UE::Anim::FSkeletonRemappingRegistry::Get().RefreshMappings(Skeleton);
        }
}

That would ensure that we only refresh the mappings when it’s the skeleton that is being regenerated, not the mesh ref skeleton. And we shouldn’t ever end up in a situation where the decompression code is being run at the same time as the skeleton is regenerated.

Would you be able to test that change against your repro?

anonymous-edc · October 29, 2025, 3:25pm

Excellent, this makes much more sense. I’ll give it a try.

anonymous-edc · October 30, 2025, 4:36pm

So far so good, our overnight ColdDDC cook passed successfully. We won’t really know until we get a few solid days of success in a row, but this is looking like it worked.

Euan_Carmichael · October 30, 2025, 5:28pm

Thanks for following up, sounds promising. I’ll keep an eye out for any updates next week.

anonymous-edc · November 4, 2025, 8:21pm

We’re looking really good, haven’t run into an issue since.

anonymous-edc · November 25, 2025, 11:39pm

I see that this was committed to //UE5/Main in CL# 48060137. Thank you!