Renderthread always starts with a wait

Whenever I profile, I find my render thread time is inflated with a couple ms wait at the very beginning. If I look at parallel tasks in insights, the most suspicious one is DeleteSceneRenderer. The wait seems to complete when this task completes.

What am I specifically waiting on here? Is there something that can be done to reduce this?

[Image Removed]

Hi there,

If you enable Task tracing for Insights, you should be able to better see how events are linked / scheduled in the task graph, to debug what is causing stalls.

You can enable Task tracing from the UI in the editor as shown below. If you are running from a build, you can set the trace channels like this: `Trace.Send <insight_host_ip> -trace=default,task` or `Trace.File <Path> -trace=default,task` to enable the Default + Task trace channels.

[Image Removed]In the below example, you can see that (at least in editor) BeginFrame is launched by the Render_BeginFrame event.

[Image Removed]You can also right-click on timer events in Insight’s timers panel to open the source code locations of most events. This can be helpful when trying to figure out why an event is stalling / waiting when it is waiting on something outside of the task graph.

Doing this, I was able to debug the event before Render_BeginFrame, and found out that, only in editor, there is frame rate smoothing applied that causes a consistent wait.

[Image Removed]Doing the above should give you a lot more information on what is actually starting / blocking your BeginFrame.

My guess is that either the engine is trying to do framerate smoothing (maybe you have this enabled? or are in editor?), or this is a result of your frame syncing setup (see the documentation here on setting up various frame syncing strategies to manage CPU->GPU latency).

Regards,

Lance Chaney

Taking another look at the 5.4 code, I see what you mean now. I think the first thing to try here is to profile this deletion code in a bit more detail to see if we can narrow down what parts are taking so long to complete. I would recommend using visual studios built in sampling profiler for this. You can access this from the Debug menu:

[Image Removed]Make sure your content is cooked, set your solution configuration to whatever you last cooked (Development, Test, or Shipping), enable the CPU usage profiling and click start.

[Image Removed]Since this profiler uses sampling to determine how long is spent in each section of the codebase, running this for longer will produce more accurate results. You can also increase the sample rate in the settings (cog wheel in above screenshot). Once you’ve finished the profile and visual studio has generated the report, click on the `Open Details` link.

[Image Removed]Select Functions from the current view dropdown

[Image Removed]Type DeleteSceneRenderers into the search box in the top right corner and hit enter

[Image Removed]Once you’ve found the function in the list, right click on it and select: View in Caller/Callee

[Image Removed]This should give you a view like the following that you can then use to drill down into which parts of the deletion code are taking the longest. Here I see that in my blank project example the actual FDeferredShadingSceneRenderer destructor is taking the longest (I only have 24 samples in this profile, so this might not be super accurate).

[Image Removed]I can then click on this function to drill down further. Drilling down the hot path far enough, I can see that, for me, the cleanup of RayTracingMeshBatchTaskPages is taking the longest. This is just a blank project though, and my DeleteSceneRenderer time measured in insights is only 85us, so I would expect your results to be quite different.

[Image Removed]

Hopefully this gives you a path forward to try to debug this issue.

Regards,

Lance Chaney

In UE 5.6 this deletion code has been refactored to make the FSceneRenderer destructor part run asynchronously so this doesn’t block (see: ENQUEUE_RENDER_COMMAND(SceneRenderBuilder_End) in SceneRenderBuilder.cpp in UE 5.6). You could try implementing similar logic to this in 5.4. I don’t see any reason why you would really need to wait for the old FSceneRenderer to be destroyed before creating / starting to use a new one.

Regards,

Lance Chaney

Hey Lance,

Frame smoothing is disabled on our project. I’m not sure where you got Render_BeginFrame from, because for me the issue is specifically with DeleteSceneRenderer taking quite a bit of time and forcing the next render thread frame to wait before it can start.

static void DeleteSceneRenderers(const TArray<FSceneRenderer*>& SceneRenderers)
{
	SCOPED_NAMED_EVENT_TEXT("DeleteSceneRenderer", FColor::Red);
 
	for (FSceneRenderer* SceneRenderer : SceneRenderers)
	{
		// Wait for all dispatched shadow mesh draw tasks.
		for (int32 PassIndex = 0; PassIndex < SceneRenderer->DispatchedShadowDepthPasses.Num(); ++PassIndex)
		{
			SceneRenderer->DispatchedShadowDepthPasses[PassIndex]->WaitForTasksAndEmpty();
		}
 
		for (FViewInfo* View : SceneRenderer->AllViews)
		{
			View->WaitForTasks();
		}
	}
 
	FViewInfo::DestroyAllSnapshots();
 
	for (FSceneRenderer* SceneRenderer : SceneRenderers)
	{
		delete SceneRenderer;
	}
}

I can see its called from a couple places.

FSceneRenderer::RenderThreadEnd and WaitForTasksAndDeleteSceneRenderers. So it seems to mostly be the cleanup process associated with the previous frame finishing. It makes sense it wouldn’t be able to start a new frame until that process ends. My question is what can I do to minimize this? because it’s spending a lot of time doing it.

Hi Lance,

From what I can tell it appears to be related to foliage, dynamic mesh elements (also foliage) and shadowmeshcollector (probably also foliage).

We have a high HISM count in the scene and I guess these allocators are created and destroyed every render thread frame?

[Image Removed]I’m sure there’s a good reason for this, and I know epic has largely moved on from HISM in favour of ISM with nanite, but I find it surprising HISMs are so expensive on the renderthread. Everything is dynamic and nothing is cached. Also even in lieu of nanite, all of the engine’s foliage (including grass) uses HISM.

Hi Lance, do you have a CL I could cherry pick for this?

These changes are unfortunately part of a large refactor that you probably wouldn’t want to try to backport to 5.4. The main gist is that the DeleteSceneRenderers logic has been split into two separate async tasks. One for the cleanup that needs to happen before `delete SceneRenderer` is called, and one for the actual delete. All the waits then only wait on the cleanup task, not the delete task.

Here are the commits that affect this code in 5.6 for your information, though, again, you probably don’t want to try to backport or cherry-pick these:

CL 38379574

CL 38603314

CL 38645032

CL 41112969

I’d recommend just looking at the final 5.6 codebase as reference, and manually implementing something like this yourself.

Regards,

Lance Chaney