VertexDeltaModel for meshes with multiple sections

I am building a custom ML Deformer model where the runtime part is identical to the VertexDeltaModel. I will refer my question to the source of that one. The function FVertexDeltaGraphDataProviderProxy::GatherDispatchData reads as follows:

void FVertexDeltaGraphDataProviderProxy::GatherDispatchData(FDispatchData const& InDispatchData) { const FSkeletalMeshRenderData& SkeletalMeshRenderData = SkeletalMeshObject->GetSkeletalMeshRenderData(); const FSkeletalMeshLODRenderData* LodRenderData = SkeletalMeshRenderData.GetPendingFirstLOD(0); const TStridedView<FVertexDeltaGraphDataInterfaceParameters> ParameterArray = MakeStridedParameterView<FVertexDeltaGraphDataInterfaceParameters>(InDispatchData); for (int32 InvocationIndex = 0; InvocationIndex < ParameterArray.Num(); ++InvocationIndex) { const FSkelMeshRenderSection& RenderSection = LodRenderData->RenderSections[InvocationIndex]; FVertexDeltaGraphDataInterfaceParameters& Parameters = ParameterArray[InvocationIndex]; Parameters.NumVertices = InDispatchData.bUnifiedDispatch ? LodRenderData->GetNumVertices() : RenderSection.GetNumVertices(); Parameters.InputStreamStart = InDispatchData.bUnifiedDispatch ? 0 : RenderSection.BaseVertexIndex; Parameters.Weight = Weight; Parameters.PositionDeltaBuffer = BufferSRV; Parameters.VertexMapBuffer = VertexMapBufferSRV; } }The problem is with the bUnifiedDispatch parameter. I have found that, in this case, this flag is always true. The reason is that this data interface has a secondary binding to the MLDeformer component in the deformer graph (because the skeletal mesh component is the primary binding). According to FComputeGraphTaskWorker::SubmitWork:

// 1. Data interfaces sharing the same binding (primary) as the kernel should present its data in a way that // matches the kernel dispatch method, which can be either unified(full buffer) or non-unified (per invocation window into the full buffer) // 2. Data interfaces not sharing the same binding (secondary) should always provide a full view to its data (unified) // Note: In case of non-unified kernel, extra work maybe needed to read from secondary buffers. // When kernel is non-unified, index = 0...section.max for each invocation/section, // so user may want to consider using a dummy buffer that maps section index to the indices of secondary buffers // for example, given a non-unified kernel, primary and secondary components sharing the same vertex count, we might want to create a buffer // in the primary group that is simply [0,1,2...,NumVerts-1], which we can then index into to map section vert index to the global vert indexMy understanding is that FVertexDeltaGraphDataProviderProxy::GatherDispatchData should then prepare a single unified invocation for the whole skeletal mesh, even if it has more than one component. The code of the function does that. However, it does not work correctly. When the skeletal mesh has more than one section:

  • InDispatchData.NumInvocations is not 1, but rather the number of sections in the skeletal mesh, as determined by UMLDeformerComponentSource::GetDefaultNumInvocations.
  • Each invocation will process a single section of the mesh, with its render vertex indexing starting at 0.

What this means is that FVertexDeltaGraphDataProviderProxy::GatherDispatchData will incorrectly set Parameters.InputStreamStart to 0 for all invocations, which will produce an incorrect results for all but the first section (see VertexDeltaModel.ush). My solution has been to define the following:

const bool bIsUnified = InDispatchData.bUnifiedDispatch && InDispatchData.NumInvocations <= 1;And use that instead of InDispatchData.bUnifiedDispatch in the loop, effectively treating the dispatch as non-unified for meshes with multiple sections. This seems to work.

Is this a bug or am I misunderstanding something? Is it possible to have unified dispatchs with more than one invocation? Should the fix be different?

Steps to Reproduce

  1. Train a VertexDeltaModel ML Deformer in a skeletal mesh with multiple sections.
  2. The deformation calculated by the deformer graph is incorrect.

I have to add a deformer graph system expert on this, so adding [mention removed]​ to this thread :slight_smile:

Jack, do you know more about this? I just tested training an internal model with 9 sections (the Rampage model) using VDM but that seems to work fine somehow.

But perhaps Javier’s model has some differences.

Thank you John for your reply and for doing the test.

Sorry, I just realised I didn’t specify that in my case the mesh I am using has two sections, but each section uses a different material slot. I am not sure if it makes a difference (or if your test also uses multiple material slots).

Hi! Thanks for the detailed report! This particular issue has been fixed in 5.6 with CL40995710 but it is a pretty big refactor + version change, so it is not something we can backport to 5.5 easily. However there are two potential workarounds.

The first one: could you make some changes to DG_VertexDeltaModel and see if the deformer would work for you? Basically we have to split the logic that was in the LBS_MLDelta kernel into two kernels, the first kernel only applies the delta, and the second kernel does the bone skinning. You can first remove the blend matrix pin from the LBS_MLDelta kernel, rename it to just MLDelta. And then change the kernel code to the following

`// Accumulation resource should be NumVertices * 8.
#define ACCUMULATION_BUFFER_NUM_INTS 8

if (Index >= ReadNumThreads().x) return;

float3 LocalPosition = ReadPosition(Index);
float3 PositionDelta = MLDeformer::ReadPositionDelta(Index);

// Jack’s change: just apply the delta, skip any skinning work
WriteOutPosition(Index, LocalPosition + PositionDelta);

// Clear the accumulation buffer before use.
for (int AccumulationIndex = 0; AccumulationIndex < ACCUMULATION_BUFFER_NUM_INTS; ++AccumulationIndex)
{
WriteOutAccumulation(Index * ACCUMULATION_BUFFER_NUM_INTS + AccumulationIndex, 0);
}`And after that, add a LinearBlendSkin_PositionOnly deformer function between the MLDelta kernel and the ComputePerTriangleNormals kernel.

The second option is that you can simply remove the following line from FComputeGraphTaskWorker::SubmitWork, which should allow the original DG_VertexDeltaModel deformer graph to run in the same way as when it was first created.

DispatchData.bUnifiedDispatch = KernelInvocation.BoundProviderIsPrimary[MemberIndex]? DispatchData.bUnifiedDispatch : true;The only downside of removing this line in UE5.5 that if the secondary binding to a kernel does not have the same number of invocations(sections) as the primary binding, it can crash because the kernel. But I don’t think there is a real use case where you will actually run into that if you are just using it for MLDeformer as the MLD component and the skeletal mesh component always matches.

Hope this helps!

Jack

Thank you Jack, modifying the deformer graph as you suggested worked.

I made a task for myself to revisit this again after we upgrade to 5.6 (probably will just copy the updated Vertex Delta Model code again).