Rare crash with accessing null DynamicData in GetBaseSkinVertexFactory on RenderThread.

Our team is having a random crash on RenderThread. Looking at the crash dump, it shows that it has variable synchronization issues. If you look at the screenshot. It shows it trying to access r8 register(dereferenced DynamicData) which is null, but the memory dump shows that DynamicData has valid value.

This is a typical crashdump pattern with variable synchronization issue, so I assumed we might have it.

Unfortunately, I can’t provide repro, the crash happens from random PC.

We are wondering if it is known and tracked, and/or fixed. Also, we are looking for a guidance to avoid it.

[Image Removed]

Steps to Reproduce
Our team is having a random crash on RenderThread. We can’t reproduce it locally. the crash dump come from random PC during playtest.

Hi there,

As you noted, this crash seems quite difficult to reproduce and is very rare. It does look like a race condition to me, but I’m unsure as to where it might be coming from. Checking the code, it looks like parallel updates / reads to DynamicData should be protected by mutually exclusive scope guards. See my detailed breakdown below for my reasoning here, and possible additional debugging steps.

It does look like this code has been majorly refactored from UE 5.5 onwards, so this should no longer be an issue in 5.5+. The following commit should fix the issue, as it removes all dependencies on DynamicData from the GetBaseSkinVertexFactory method:

https://github.com/epicgames/UnrealEngine/commit/a9ca63cf10614cc446572c880459c58e4156dd76

However, it’s also quite a large refactor, so it might be difficult to backport if you need to do that.

Detailed breakdown:

This crash is coming from a deformer graph, containing a cloth node, which triggers a call to FSkeletalMeshObjectGPUSkin::GetBaseSkinVertexFactory, where the race condition and crash occurs. The DynamicData variable it is reading is updating in FSkeletalMeshObjectGPUSkin::UpdateDynamicData_RenderThread, which is in turn enqueued from FSkeletalMeshObjectGPUSkin::Update, which enqueues UpdateDynamicData_RenderThread to the UE::RenderCommandPipe::SkeletalMesh command pipe. This command pipe DOES run on its own thread (NOT the render thread), however, it should not be possible for this task pipe to be running when FSkeletalMeshObjectGPUSkin::GetBaseSkinVertexFactory is called. In your call stack there is a call to ComputeFramework::FlushWork, which contains the line:

UE::RenderCommandPipe::FSyncScope SyncScope({ &UE::RenderCommandPipe::SkeletalMesh });

This should prevent the pipe from running while this render thread task is executing. But, perhaps something isn’t working as expected here. You could try adding `check(!UE::RenderCommandPipe::SkeletalMesh.IsReplaying())` to the top of the FSkeletalMeshObjectGPUSkin::GetBaseSkinVertexFactory, function to ensure that this condition is being met.

It might also be interesting to see the state of the parallel stacks window in visual studio when you get the crash to occur again. This might tell you where the modification to DynamicData is coming from, assuming the thread modifying this hasn’t moved on very far since the crash occurred.

Hopefully this can help you debug and solve your issue,

Regards,

Lance Chaney

In the crash dump, whoever was it, it was long gone. I can put check and see if hit. otherwise. sounds like we should move to 5.5 or later before we put in more time into it.

Thank you for the advice, and have a nice day!