[Deformer Graph] Advanced Skeleton node: crash due to (USkeletalMeshComponent*)SkeletalMesh->MeshObject memory corruption?

Hello,

This crash happens in skel mesh editor (desktop) as well as PS5 (runtime).

The random nature of the crash points towards some kind of memory corruption.

Bellow is the excerpt of code that crashes with AdvancedSkeleton node:

`//OptimusDataInterfaceAdvancedSkeleton.cpp

FSkeletalMeshObject* SkeletalMeshObject = SkeletalMesh->MeshObject;
const int32 LodIndex = SkeletalMeshObject->GetLOD(); // Inspecting inside SkeletalMeshObject when exception is triggered shows correct value for SkeletalMesh->MeshObject->DynamicData->LODIndex
FSkeletalMeshRenderData const& SkeletalMeshRenderData = SkeletalMeshObject->GetSkeletalMeshRenderData();
FSkeletalMeshLODRenderData const& LodRenderData = SkeletalMeshRenderData.LODRenderData[LodIndex]; // crash here as Lodindex is corrupted`Fiddling around I added:

FSkeletalMeshObject* SkeletalMeshObject = SkeletalMesh->MeshObject; const int32 LodIndex = SkeletalMeshObject->GetLOD(); FSkeletalMeshRenderData const& SkeletalMeshRenderData = SkeletalMeshObject->GetSkeletalMeshRenderData(); check(LodIndex == SkeletalMesh->MeshObject->GetLOD()); FResourceSizeEx CumulativeResourceSize; SkeletalMesh->MeshObject->GetResourceSizeEx(CumulativeResourceSize); // Now crashes inside this method with DynamicData = 0xFFFF (see screeshot) FSkeletalMeshLODRenderData const& LodRenderData = SkeletalMeshRenderData.LODRenderData[LodIndex];Other minor fiddling with the code that don’t change functionality will also sometimes trigger SkeletalMesh->MeshObject->DynamicData to be null. I suspect the pointer to be freed in some other thread (explaining the random nature of the bug as well as the slight change in crash behavior).

Since in the mainline this seems to be fixed I was hoping maybe someone had an idea of what change I should integrate to avoid this in 5.5.

Steps to Reproduce
Debug Editor Build.

This crash happen with the attached FBX model.

shinbi_base_model.fbx is the imported model.

shinbi_sns_skin_weights.fbx is a skin weight profile imported for LOD0

The skeletal mesh should use a deformer that rely on the advanced skeleton node

(for instance re-using LBS deformer built in asset but making sure it does not use the deprecated skeleton node)

I could not find an exact series of steps to trigger the crash, however, I would be able to consistently trigger it after fiddling several seconds or minutes.

(see the video)

Roughly speaking the crash may happen in the following situations:

Opening up the Skel mesh editor

changing the current LOD level in the detailed pannel

Scrolling down the list of mesh sections and going back up

changing the LOD level again.

Repeat

Other actions that you can mix with the above:

  • resizing the viewport
  • make an skeletal animation play
  • play and pause the animation.
  • zooming in out inside the viewport (whether LOD is auto or a fixed value)
  • reducing maximizing the editor window.

at some point one of the above actions should trigger in DataInterfaces\OptimusDataInterfaceAdvancedSkeleton.cpp a buffer overflow:

FSkeletalMeshLODRenderData const& LodRenderData = SkeletalMeshRenderData.LODRenderData[LodIndex];

where LodIndex is assigned some negative garbage value.

(however Inspecting SkeletalMesh->MeshObject->DynamicData->LODIndex displays a valid value)

The above error with LodIndex would also show up more up more easily on PS5 platform (debug build), just playing a small test map with characters using the AdvancedSkeleton node.

I was not able to reproduce the crash on the main branch (from github) which may indicate this is now fixed in 5.6. (for now I have just tested desktop editor for this claim though)

Hi! Thanks for another detailed report! I will get back to you as soon as possible. Meanwhile, I am curious, did you see this type of corruption with the old skeleton DI? In theory if the MeshObject is bad, both new and old Skeleton DI should fail. And if you are not using anim attributes or layered skinning, both DI should go through almost the code logic.

Thanks,

Jack

Just an update, I am still investigating the issue but will be traveling next week. Will resume work once I get back.

Jack

So there is a fix, CL 40202308 that we put in after 5.5 that might address this issue, could you give it a try? Basically the problem was that in some cases, enqueued work wasn’t being properly retracted on a proxy recreate, so we had work executed pointing at the dead proxy.

Let me know how it goes, thanks!

Jack

Could you check if this github link works for you?

I see, I will check with the rendering engineers to see if there are other related changes you can try, but yes there have been quite a bit of changes in 5.6 went in around this area that may have indirectly addressed the issue here. I will keep an eye out for it.

Thanks

Jack

Hi! Thank you for looking into this!

I tested again on 5.5 desktop with the old Skeleton DI (simply setting the deformer of the shinbi model to the built in DG_linearBlend.

I tried my best during 5 mns or so to make it crash and it did not.

Then I went ahead and added to the old skeleton node the following code:

`FComputeDataProviderRenderProxy* UOptimusSkeletonDataProvider::GetRenderProxy()
{
FSkeletalMeshObject* SkeletalMeshObject = SkinnedMesh != nullptr ? SkinnedMesh->MeshObject : nullptr;
const int32 LodIndex = SkeletalMeshObject->GetLOD();
FSkeletalMeshRenderData const& SkeletalMeshRenderData = SkeletalMeshObject->GetSkeletalMeshRenderData();

check(LodIndex == SkeletalMeshObject->GetLOD());
if(!SkeletalMeshRenderData.LODRenderData.IsValidIndex(LodIndex) )
{
//UE_LOG(LogOptimus, Error, TEXT(“SkeletalMeshRenderData.LODRenderData is not valid for LOD %d”), LodIndex);
check(false);
}

FResourceSizeEx CumulativeResourceSize;
SkeletalMeshObject->GetResourceSizeEx(CumulativeResourceSize);

FSkeletalMeshLODRenderData const& LodRenderData = SkeletalMeshRenderData.LODRenderData[LodIndex];

return new FOptimusSkeletonDataProviderProxy(SkinnedMesh);
}`Testing again a few minutes I was able to crash again on GetResourceSizeEx:

[Image Removed]I commented out GetResourceSizeEx and struggled to make it crash (took longer and I had to open up a test map with shinbi and go back and forth the skel editor)

but it did crash with LOD

[Image Removed]

By the way starting tomorrow I’ll be on vacations 2 weeks and half, I’ll try my best to keep an eye on this but might not be able to provide feeback (one of my colleagues may take over during that period though)

As always, thank you for the amazing support! :folded_hands:

Thanks for the input! I don’t have access to perforce though (I asked a colleague and he wasn’t able to find it) Could you tell me the git commit so that I can see it on github? (just to be sure this is a fix present in 5.6?

Thank you :folded_hands:

So I merged the lines regarding `ReleaseMeshDeformerInstances();` (but did not merged lines 1185-1191 UMeshDeformerInstance::FEnqueueWorkDesc Desc; since they were already there in 5.5)

and it did not fix it :cry:

Well, since last time I checked, it was working in 5.6, I’m not sure it’s worth investigating this issue too thoroughly though.