Mutable runtime GT performance

Hello!

We are heavily utilizing Mutable for building characters in runtime and we are currently looking into ways of optimizing our setup.

One of the main issues that we are trying to tackle right now are game thread spikes when updating a customizable object instance.

I’ve downloaded the Mutable Sample project to confirm if this behavior was observable there and it was. When running the infinite level benchmark in a packaged game (development configuration) Mutable_UCustomizableObjectSystem::TickInternal would have a lot of spikes to ~5ms on a Threadripper CPU:

[Image Removed]

With a lot of these spikes coming from ConvertResources:

[Image Removed]

I tried packaging a 5.7 version of the project, but had issues with that, so I couldn’t confirm if it performs better. However I’ve seen a similar timing pattern in the editor as in 5.6.

From what I understand, all the non-passthrough resources requested by the current graph need to be converted to Mutable representation for the CO to be built. I have tried using states to run ConstructMesh only for selected body parts which was (a) very inconvenient and convoluted the setup and (b) still would leave the potential for timing spikes, e.g. a 1.2ms one here:

[Image Removed]

  1. What is the recommended way of tackling these issues? Is there a good way to split this work into multiple frames or ensure that it respects the timing budget?
    1. I have tried configuring the LOD streaming, but I haven’t had success with reducing the spikes from the generation- is this the expected result or is this something that should also help and my configuration was incorrect?
  2. We are usually generating a set of character variations that get reused during the game. Does it make sense to have an off-screen character “source” that simply gets duplicated whenever we need a character on-screen and then establish the budgeting around those source actors?

Thanks!

[Attachment Removed]

Hey there,

To double-check, have you turned on Mesh Streaming in your customizable object graph?

[Image Removed]This was added in 5.6, should help with these scenarios and is different from the LOD streaming settings.

What is the recommended way of tackling these issues? Is there a good way to split this work into multiple frames or ensure that it respects the timing budget?

Unfortunately, at this point, we don’t have any tools besides Mesh Streaming to split work across multiple frames. That is the method for making a character asynchronously.

We are usually generating a set of character variations that get reused during the game. Does it make sense to have an off-screen character “source” that simply gets duplicated whenever we need a character on-screen and then establish the budgeting around those source actors?

That is not something that we’ve used before but could be a good strategy to handle specific scenarios. Also if you do this often based on a limited set of data, you might also benefit from baking out a few characters to disk and loading in your baked characters, folks sometimes forget that you can do runtime generation and baked but still from the same asset.

Dustin

[Attachment Removed]

Hey there,

This might be a further configuration problem and because of how LOD mesh streaming is hooked up to unreals mesh streaming system. What you need to have is.

  1. Must be enabled on the engine. (Done in your project settings)
  2. If you have a state, it must allow mesh streaming to be on (DisableMeshStreaming = false)
  3. Has to be enabled in the CO (EnableMeshStreaming=true)
  4. Has to be enabled in the Mutable (StreamMeshLODsEnabled=true)

It’s a bit tough to test too, as if you spawn the character up close it will load the lowest resident lod and then everything will be the same. That step you are seeing is the last step in the process of transferring from the Mutable buffers into the skeletal mesh buffers, and we don’t do it over multiple frames.

But is it possible to combine some prebaked parts with runtime generation for the rest of the CO?

Yes, though it’s the same as the current setup in that as soon as they become their own skeletal meshes, they are run through the system like normal.

Dustin

[Attachment Removed]

Thanks for the call stack. Unfortunately, it’s not very revealing about what the issue would be. Would you be able to repro with -onethread and -rdgimmediate enabled in the build, and send back the call stack and the log?

[Attachment Removed]

Also, can you verify that you’re on 5.6.2?

[Attachment Removed]

We were wondering what your PC specs are that are running into the crash?

[Attachment Removed]

Thanks for the info.

The team has been repro-ing and hasn’t hit the crash. Can you share a log with your crash? Most likely this is mesh related, but could also be related to the grooms so what we’re looking for is the log and and if the logs say something like, “Start %u, Count %u, Type %u, Buffer Size %u, Buffer stride %u”. Looking to see which asset is causing the crash.

Dustin

[Attachment Removed]

Thank you! Just a heads up, much of Epic is going on holiday for 2 weeks for the end of the year and will be back on January 5th. I wouldn’t expect an answer back until after then.

[Attachment Removed]

Hey there,

I just reached out to the team yesterday and they are still digging into it. So unfortunately no updates at the moment.

Dustin

[Attachment Removed]

Hey just want to let you know that the dev team is still working on this, they’ve had to drop and pick it up a bit.

[Attachment Removed]

Hi Fedor Matveev,

You are right. Enabling both Mesh and Texture Streaming is the best approach. Although Mesh Streaming is implemented on 5.6, it is not stable enough to be production ready. If so, we strongly recommend to use 5.7. Texture streaming works on 5.6 without issues. Also in 5.8 we have plans to move remaining GT copies to other threads + increased updates throughput.

In any case, from the trace, looks like you are generating uncompressed Textures, hence the Mutable_UpdateResource scope. You can easily see which Textures are not compressed using the Texture Analyzer.

[Attachment Removed]

Hello!

Thank you for the answer :blush:

I noticed that in our project the mesh streaming was disabled for subobject COs and decided to re-verify in the MutableSample project for a clean test. I ran the packaged game in 2 configurations. (A) was using Mutable.StreamMeshLODsEnabled 1 and Mutable.ForceStreamMeshLODs 1 , (B) had mesh streaming disabled through Mutable.StreamMeshLODsEnabled 0. I’m still seeing very comparable performance levels and similar spikes in ConstructMesh().

(A, forced mesh streaming):

[Image Removed]

(B, disabled mesh streaming):

[Image Removed]

Am I missing something else that should be done? I am expecting different LODs to end up in different frames consistently, is this the correct result I’m looking here for?

Regarding baking the characters - as we are doing a lot of variations and adjusting meshes for clipping often, we probably want to save out on disk space and not bake down the variations. But is it possible to combine some prebaked parts with runtime generation for the rest of the CO?

[Attachment Removed]

Hey Dustin,

Thank you for the answer and sorry for a long response!

It was the project setting that was missing, I enabled it and finally started seeing the improvements :grinning_face:

The only thing is that now I seem to be getting this crash both in our project and in the Mutable sample project on the infinite level using 5.6 and having Mutable.ForceStreamMeshLODs 1

Is this a known crash? Could this have been fixed in 5.7 already?

If not, could this be tied to be using the experimental ForceStreamMeshLODs cvar?

Assertion failed: (StartIndex + IndexCount) * IndexBuffer->GetStride() <= IndexBuffer->GetSize() [File:D:\build\++UE5\Sync\Engine\Source\Runtime\D3D12RHI\Private\D3D12Commands.cpp] [Line: 1862] 
 
Start 0, Count 23532, Type 0, Buffer Size 80064, Buffer stride 4
 
 
 
 
 
0x00007ff607aa9088 MutableSample.exe!FDebug::CheckVerifyFailedImpl2() []
 
0x00007ff60c354d6a MutableSample.exe!FD3D12CommandContext::RHIDrawIndexedPrimitive() []
 
0x00007ff60c65c501 MutableSample.exe!FRHICommandDrawIndexedPrimitive::Execute() []
 
0x00007ff60ae60ce1 MutableSample.exe!FRHICommand<FRHICommandDrawIndexedPrimitive,FRHICommandDrawIndexedPrimitiveString1840>::ExecuteAndDestruct() []
 
0x00007ff60c65cd81 MutableSample.exe!FRHICommandListBase::Execute() []
 
0x00007ff60c68a39e MutableSample.exe!FRHICommandListExecutor::FTranslateState::Translate() []
 
0x00007ff60c646147 MutableSample.exe!`TSharedPipelineStateCache<FWorkGraphPipelineStateInitializer,FWorkGraphPipelineState * __ptr64>::DiscardAndSwap'::`5'::<lambda_1>::operator()() []
 
0x00007ff60c65e741 MutableSample.exe!FRHICommandListExecutor::FTaskPipe::Execute() []
 
0x00007ff607dd77a6 MutableSample.exe!TGraphTask<TFunctionGraphTaskImpl<void __cdecl(enum ENamedThreads::Type,TRefCountPtr<FBaseGraphTask> const & __ptr64),0> >::ExecuteTask() []
 
0x00007ff6077aefe3 MutableSample.exe!UE::Tasks::Private::FTaskBase::TryExecuteTask() []
 
0x00007ff60777f04f MutableSample.exe!LowLevelTasks::TTaskDelegate<LowLevelTasks::FTask * __ptr64 __cdecl(bool),48>::TTaskDelegateImpl<`LowLevelTasks::FTask::Init<`UE::Tasks::Private::FTaskBase::Init'::`2'::<lambda_1> >'::`13'::<lambda_1>,0>::CallAndMove() []
 
0x00007ff60778d1de MutableSample.exe!LowLevelTasks::FTask::ExecuteTask() []
 
0x00007ff60778cfc5 MutableSample.exe!LowLevelTasks::FScheduler::ExecuteTask() []
 
0x00007ff6077b2ad1 MutableSample.exe!LowLevelTasks::FScheduler::WorkerLoop() []
 
0x00007ff607776a4a MutableSample.exe!operator<<() []
 
0x00007ff607984b93 MutableSample.exe!FThreadImpl::Run() []
 
0x00007ff607de3448 MutableSample.exe!FRunnableThreadWin::Run() []
 
0x00007ff607ddc0df MutableSample.exe!FRunnableThreadWin::GuardedRun() []
 
0x00007ffd6bef259d KERNEL32.DLL!UnknownFunction []
 
 
 
Crash in runnable thread Background Worker #9

[Attachment Removed]

Answering both of you questions:

  1. I am on 5.6.1 (the latest launcher version), I don’t think that 5.6.2 exists?
  2. I was able to repro the crash with -onethread and -rdgimmediate! It took a longer time for it to happen while running the benchmark in the infinite level, here’s the callstack:
 	KERNELBASE.dll!00007ffd6b31055c()	Unknown
 	MutableSample.exe!FWindowsErrorOutputDevice::Serialize(wchar_t const *,enum ELogVerbosity::Type,class FName const &)	C++
 	MutableSample.exe!FOutputDevice::LogfImpl(wchar_t const *,...)	C++
 	MutableSample.exe!FDebug::AssertFailed(char const *,char const *,int,wchar_t const *,...)	C++
 	MutableSample.exe!FDebug::CheckVerifyFailedImpl2V(char const *,char const *,int,wchar_t const *,char *)	C++
 	MutableSample.exe!FDebug::CheckVerifyFailedImpl2(char const *,char const *,int,wchar_t const *,...)	C++
>	MutableSample.exe!FD3D12CommandContext::RHIDrawIndexedPrimitive(class FRHIBuffer *,int,unsigned int,unsigned int,unsigned int,unsigned int,unsigned int)	C++
 	MutableSample.exe!FMeshDrawCommand::SubmitDrawEnd(class FMeshDrawCommand const &,struct FMeshDrawCommandSceneArgs const &,unsigned int,class FRHICommandList &)	C++
 	MutableSample.exe!FInstanceCullingContext::SubmitDrawCommands(class TArray<class FVisibleMeshDrawCommand,class TLinearArrayAllocatorBase<struct FSceneRenderingBlockAllocationTag,0> > const &,class Experimental::TRobinHoodHashSet<class FGraphicsMinimalPipelineStateInitializer,struct DefaultKeyFuncs<class FGraphicsMinimalPipelineStateInitializer,0>,class TSizedDefaultAllocator<32> > const &,struct FMeshDrawCommandOverrideArgs const &,int,int,unsigned int,class FRHICommandList &)	C++
 	MutableSample.exe!FParallelMeshDrawCommandPass::Draw(class FRHICommandList &,class FInstanceCullingDrawParams const *)	C++
 	MutableSample.exe!TRDGLambdaPass<class FVirtualSmBuildHZBPerPageCS::FParameters,class `FComputeShaderUtils::AddPass<class FVirtualSmBuildHZBPerPageCS>(class FRDGBuilder &,class FRDGEventName &&,enum ERDGPassFlags,class TShaderRefBase<class FVirtualSmBuildHZBPerPageCS,class FShaderMapPointerTable> const &,class FVirtualSmBuildHZBPerPageCS::FParameters *,class FRDGBuffer *,unsigned int,class TFunction<void (void)> &&)'::`2'::<lambda_1> >::Execute(class FRHIComputeCommandList &)	C++
 	MutableSample.exe!FRDGBuilder::ExecutePass(class FRHIComputeCommandList &,class FRDGPass *)	C++
 	MutableSample.exe!FRDGBuilder::ExecuteSerialPass(class FRHIComputeCommandList &,class FRDGPass *)	C++
 	MutableSample.exe!FRDGBuilder::SetupAuxiliaryPasses(class FRDGPass *)	C++
 	MutableSample.exe!FRDGBuilder::SetupParameterPass(class FRDGPass *)	C++
 	MutableSample.exe!FSceneRenderer::RenderVelocities(class FRDGBuilder &,class TArrayView<class FViewInfo,int>,struct FSceneTextures const &,enum EVelocityPass,bool,bool)	C++
 	MutableSample.exe!`FRHIBreadcrumbAllocator::AllocBreadcrumb<class UE::RHI::Breadcrumbs::Private::TRHIBreadcrumbDesc<0> >(class std::tuple<class UE::RHI::Breadcrumbs::Private::TRHIBreadcrumbDesc<0> const *,class std::tuple<> > const &)'::`2'::<lambda_1>::operator()<>(void)	C++
 	MutableSample.exe!FDeferredShadingSceneRenderer::Render(class FRDGBuilder &,struct FSceneRenderUpdateInputs const *)	C++
 	MutableSample.exe!FRendererModule::RenderPostResolvedSceneColorExtension(class FRDGBuilder &,struct FSceneTextures const &)	C++
 	MutableSample.exe!TGlobalResource<class FOcclusionQueryIndexBuffer,1>::`vector deleting destructor'(unsigned int)	C++
 	MutableSample.exe!`FComputeShaderUtils::AddPass<class FSceneCullingDebugRender_CS>(class FRDGBuilder &,class FRDGEventName &&,enum ERDGPassFlags,class TShaderRefBase<class FSceneCullingDebugRender_CS,class FShaderMapPointerTable> const &,class FShaderParametersMetadata const *,class FSceneCullingDebugRender_CS::FParameters *,struct UE::Math::TIntVector3<int>)'::`2'::<lambda_1>::operator()(struct FRDGAsyncTask,class FRHIComputeCommandList &)	C++
 	MutableSample.exe!FSceneRenderProcessor::Execute(void)	C++
 	MutableSample.exe!FRendererModule::BeginRenderingViewFamilies(class FCanvas *,class TArrayView<class FSceneViewFamily * const,int>)	C++
 	MutableSample.exe!FRendererModule::BeginRenderingViewFamily(class FCanvas *,class FSceneViewFamily *)	C++
 	MutableSample.exe!UGameViewportClient::Draw(class FViewport *,class FCanvas *)	C++
 	MutableSample.exe!FViewport::Draw(bool)	C++
 	MutableSample.exe!UGameEngine::RedrawViewports(bool)	C++
 	MutableSample.exe!UGameEngine::Tick(float,bool)	C++
 	MutableSample.exe!FEngineLoop::Tick(void)	C++
 	MutableSample.exe!GuardedMain(wchar_t const *)	C++
 	MutableSample.exe!GuardedMainWrapper(wchar_t const *)	C++
 	MutableSample.exe!LaunchWindowsStartup(struct HINSTANCE__ *,struct HINSTANCE__ *,char *,int,wchar_t const *)	C++
 	MutableSample.exe!WinMain()	C++
 	[Inline Frame] MutableSample.exe!invoke_main() Line 102	C++
 	MutableSample.exe!__scrt_common_main_seh() Line 288	C++
 	kernel32.dll!00007ffd6bef259d()	Unknown
 	ntdll.dll!00007ffd6db6af78()	Unknown

I can also provide the dump if that’ll be of any help.

[Attachment Removed]

I am running a Threadripper PRO 7945WX + 64Gb of ram + RTX 4080

[Attachment Removed]

Of course, here you go!

And I can see the log saying exactly what you were looking for at the very end:

[2025.12.03-09.33.44:741][462]LogWindows: Error: appError called: Assertion failed: (StartIndex + IndexCount) * IndexBuffer->GetStride() <= IndexBuffer->GetSize() [File:D:\build\++UE5\Sync\Engine\Source\Runtime\D3D12RHI\Private\D3D12Commands.cpp] [Line: 1862] 
Start 23532, Count 13614, Type 0, Buffer Size 103032, Buffer stride 4

[Attachment Removed]

Ah, for some reason the log got removed, reattaching it.

[Attachment Removed]

[mention removed]​ Hello there and a Happy New Year :grinning_face_with_smiling_eyes:

Do you have any updates on this from your end by any chance?

[Attachment Removed]

Got it, thank you! Let me know if you need more information from me :slightly_smiling_face:

[Attachment Removed]

Thank you, appreciate the update!

[Attachment Removed]