This is under Render Thread, FDrawscenecommand, RenderViewFamily, InitViews, View Visibility, Occlusion Cull, Fetch Visibility for primitives, and there it is “RenderQuery Result” and it’s costing me 8ms.
What is it, and how do I go about optimizing the problem? My scene is fairly simple and consists of a small landscape with a material and grass as Foliage, I assume it has something to do with the grass.
When you disable HZB occlusion, the engine falls back to the GPU occlusion query, which "renders“ the bounding box of each primitive in scene to the GPU pipeline to see how many pixels of the primitive will end up on the scene. If the pixel count is 0, the primitive is treated as occluded. There might be some stalls related to the queries.
HZB is GHZBOcclusion and the platform by this line
bool bHZBOcclusion = (!IsOpenGLPlatform(GShaderPlatformForFeatureLevel[Scene->GetFeatureLevel()]) && GHZBOcclusion) || (GHZBOcclusion == 2);
in the function
static int32 OcclusionCull(FRHICommandListImmediate& RHICmdList, const FScene* Scene, FViewInfo& View)
located in SceneVisibility.cpp
RenderQuery Result is when the render thread stalls waiting for the GPU to finish the Occlusion Query, and return the results to the render thread, so that it knows what to render.
At the same time, the game thread is stalled waiting for the render thread.
This can be turned on or off with the console command
But why does it take so god ■■■■ long? We have a basically empty scene. A cube to stand on, 2 lights, no background, 2 quite complex widgets placed in widget components (which should only render as plane with a material on it), 2 VR Controller. Neither is there much in the scene nor are these objects complex. Still we have 3ms…6ms
In 4.19 everything was fine, even with mesh data of 3.000.000 vertices. Since update to 4.23 (we were forced to cause of HUGE memory leak when switching map), we trouble with that thing…
Edit: r.AllowOcclusionCulling=0 btw (shown by console). Though i noticed that the engine seems to have trouble reading in our config files, as something else does not work as well, but even if it has trouble, I would asume the console shows the true value.
Edit2: After more investigation: UE 4.23
the stat appears in D3D11Query.cpp FD3D11DynamicRHI::GetQueryData as STAT_RenderQueryResultTime.
From Unreal 4.19 to 4.23 they made some changes to that file, I gues cause of the new Render Graph.
Changing r.AllowOcclusionQuery
to 0: Causes FD3D11RenderQueryBatcher::PerFrameFlush to call uppon GetQueryData with bWait=true
to 1: Causes FD3D11RenderQueryBatcher::PollQueryResults to call uppon GetQueryData with bWait=false
So setting it to 0 will cause waits, which is the huge time shown by STAT_RenderQueryResultTime (RenderQuery Result). Setting it to 1 will not cause the waits and RenderQuery Result will not appear. This seems to be the wrong way round?!? I’ll make a bug report and see where it goes.
Comparing it to Unreal 4.19, bWait=true in both cases.
bool FD3D11DynamicRHI::GetQueryData(ID3D11Query* Query,void* Data,SIZE_T DataSize,bool bWait, ERenderQueryType QueryType)
{
// Request the data from the query.
HRESULT Result = Direct3DDeviceIMContext->GetData(Query,Data,DataSize,0);
// Isn't the query finished yet, and can we wait for it?
if ( Result == S_FALSE && bWait )
{
...
The difference is that in 4.19 Result is always S_OK (in my small test map) and the wait loop is not entered.
As far as i can tell, the issue has been targeted 4.26, but what if you can’t wait for that (ie. your game is scheduled for release within a few months?) Does there exist an easy hacky fix?
This isn’t a bug. You’re asking the graphics API to push all the geometry that’s been rendered already, out to the GPU, wait for the GPU to finish (at least up to this point,) and then read the result back. The call itself doesn’t take any time – waiting for the GPU (which generally runs asynchronously, and behind the CPU,) to “catch up” to this point in the stream, is what’s “taking the time.”
You should measure actual frame rate. If the frame rate doesn’t change, then what’s happening is that a stall that would happen somewhere else (waiting on vsync, for example) is now bubbled around in the pipeline to instead happen (at least partially) at your render query. If the frame rate really does go down, then you’re asking for the visibility too soon after the query has been issued, and causing an actual pipeline stall, rather than just a synchronization point.
@jwatte I realize running with r.AllowOcclusionQueries=0 is bad, and that’s not the solution im looking for personally. Your suggestion with a sort of stall, that ends up getting “bubbled” around sounds plausible. Also given the fact that it does not stall always (for me), only about every 30-40fram. In the profiler i can see FDrawSceneCommand spike coming from a call to DeleteResources, which normally take about 0.1ms, and at the spike time, ends up at around 3-7ms. When im home infront of PC, i will post a screenshot of my scenario/profile.
EDIT: Actually just reviewed the picture from @Rumbleball below, and it’s the excact same scenario i have.
r.HZBOcclusion=1 in DefaultEngine.ini fixes the issue, but may cause other issues. For now i can see it solves the hitches though. Let me know it it solves the issue for you.
Seeing the same issue. It appears with regularity, but not every frame. I’m not sure it’s caused by the occlusion queries themselves, since turning them off seems to move the problem. I’m guessing something is stalling the GPU and delaying whatever command the thread is waiting on.
It increases the render time enough that it throws off the running start rendering in SteamVR, which is probably why it’s causing hitching.