I noticed that in Development builds the Render thread times are suspiciously similar to GPU times and the same thing happens when enabling FORCE_USE_STATS in Test configuration. I also tested this in Development Editor on a empty level on today’s //UE5/Main (CL 1025452).
Looking into Unreal Insights (I attached a picture) it seems that Render thread waits for a Task that waits on GPUOcclusion (though I can’t find the exact GPU event that it waits on). In my case of an empty level, this wait is close to ~75% of the frame time.
I understand that this doesn’t happen at all in actual Shipping builds (since STATS won’t be enabled there) and also the wait doesn’t actually slow down the game (at least not in a case of an empty level), but it skews the frame times (GRenderThreadTime) shown by for example “stat unit”, making it hard to see impacts of changes without looking into eg. Unreal Insights.
We were thinking of enabling STATS in our Test configuration and doing profiling with it enabled and this is one of the reasons we haven’t done that yet.
That GPU occlusion task is waiting for the previous frame’s occlusion culling task to complete before it can continue. You should be able to view the task graph dependencies if you take a capture with the -task channel flag set. Unfortunately, this dependency is not always explicit, but we are also reworking how we sync the game, render, and RHI threads to make it more straightforward. I don’t know when those changes will land, but we know that folks would like to have some improvements in how we visualize these events. I hope this helps clear up your questions, but please let me know if anything is unclear.
You are right that the wait is there no matter whether STATS are enabled, I’m sorry, I should have checked that it is not the reason for the discrepancy in frame times I’m seeing.
What I should have put more emphasis on is the difference when looking at the GRenderThreadTime value. This can also be visible in “stat unit”.
I’m attaching 2 images, one is from Test with STATS and the other is Test without STATS. You can see the “Draw” (render thread) time is much higher when STATS are enabled, because with STATS the wait is counted as render thread time.
[Image Removed][Image Removed]
Looking further into it, it seems that enabling STATS causes RenderThread.Waits to not be recorded for the RenderThread so they are not subtracted in FSlateRHIRenderer::PresentWindow_RenderThread. If I’m looking at it correctly it happens because of the STATS ifdef in FNamedTaskThread::ProcessTasksNamedThread causing bIgnoreThreadIdleStats to be true in FEventWin::Wait:
[Image Removed]
I would expect this to be the other way around. That enabling a profiling macro like STATS would add more detailed info about stalls.
Sorry for the long wait. I have been digging into this further to understand your observations, and I wanted to give you a quick status update. You likely discovered a bug in how we tracked wait blocks on the render thread after we performed a larger rework of our render and RHI thread architecture. Having spoken to one of our engineers, we suspect that FNamedTaskThread::ProcessTasksNamedThread used to wait on the GPU via an explicit RHI query, but since the refactor, we are using task graph threads, which are going through a different code path to do the querying. The bCountAsStall should be triggered in all cases, regardless of STATS being present or not. I still need to figure out how that all works together, so I will get back to you once I find out more. In the meantime, please don’t hesitate to ask if you have any more questions or would like to add some findings.
I tried chasing down some more information for you, but I have hit a bit of a dead end in my investigation. I have escalated the ticket to the dev team, who will investigate this issue further to find a fix for this regression. You can follow along the progress here:
On our end, setting r.HZBOcclusion 1 in a dev build - so STATS enabled and big RenderThread timings - will “fix” the issue and bring back the RenderThread timing to its expected value.
Strangely, on a Test build without STATs enabled (so regular RenderThread timings), setting r.HZBOcclusion 1 will this time do the opposite and make the RenderThread report big timings…
We did not have time to investigate yet but it looks like there’s a mixup of 2 issues here since the second case is triggering the problem without STATs enabled.
Hi Lucas, I appreciate the extra information. I will pass it along to the dev team to help them in their investigation. If you end up finding any other details, please feel free to share them here