Example in Insights with NumBufferedOcclusionQueries = 0
[Image Removed]
Example in Insights with NumBufferedOcclusionQueries=3
[Image Removed]
Example in Insights with NumBufferedOcclusionQueries = 0
[Image Removed]
Example in Insights with NumBufferedOcclusionQueries=3
[Image Removed]
Steps to Reproduce
We noticed a long wait on occlusion queries in the render thread, specifically around WaitForGatherDynamicMeshElements. This spawned an OcclusionCullPipe job that blocks the render thread and waits until the GPU has finished occlusion tests.
To get around this, we increased the r.NumBufferedOcclusionQueries. As soon as we did this, RHI and GPU times went up. What I found was there is still a sync point in the RHI thread that is causing a stall across all threads.
The behavior I was expecting was no wait. I expect the current frame to use the previous frames occlusion results, which should be ready and available. It seems there stil might be a wait or fence here that is preventing the game, rhi, and render threads from moving on until the GPU finishes.
Vsync is off.
Hi there,
Thanks for sharing your insights trace screenshots and your observations.
From the provided screenshots, it appears that the GPU workload is significantly higher than that of the GameThread, RenderThread, or RHIThread. This indicates the project is GPU-bound, and the CPU is frequently waiting for the GPU to catch up.
The syncpoint_wait events you’re seeing are expected in this scenario. Even with VSync disabled and r.NumBufferedOcclusionQueries set to a high value (max 4), the CPU can only run so far ahead before it must synchronize with the GPU. When using the D3D12 RHI, there is currently a forced sync point in the viewport implementation that waits for the previous frame to complete immediately after the submission of the subsequent frame. This effectively limits the RHI thread to running a maximum of 1 frame ahead of the GPU.
Even if this restriction were lifted, an eventual sync point would still be necessary to prevent the CPU from running too far ahead of the GPU. Without such synchronization, memory consumption would grow unbounded due to excessive buffering (probably on both the CPU and GPU) and would eventually crash.
While the syncpoint_wait events might appear as CPU stalls, they are not indicative of CPU performance issues in this case. They simply indicate that the frame time is currently GPU bound.
Regarding your observation that increasing r.NumBufferedOcclusionQueries causes both RHI and GPU times to increase. While I was able to reproduce the RHIThread time increase caused by the syncpoint_wait events when r.NumBufferedOcclusionQueries = 3, I was unable to reproduce a corresponding increase in GPU time. This indicates the GPU time increase you’re seeing is probably due to project specific factors. If you’re able to provide a minimal reproduction isolating this behavior (preferably starting from a blank project), I’d be happy to investigate further.
Let me know if you have any follow-up questions or additional info to share.
Best regards,
Thomas