Occlusion Culling Sync Points cause poor performance in even simple scenes

From my scene with 5k objects, sync points during occlusion cull take most of the time in my frame. I have made no modifications to the culling and it should be all default.

This issue occurs in my main menu too which has very few objects. We get bizarre RHI times:
image

What is happening here? I have a GTX 1070, VRam seems fine. Issue reproduces on multiple machines.

Project is 5.1, Nanite enabed, WIndows. Issue reproduced in both a package and in editor.

2 Likes

We’re also struggling with this, and seemingly no way to solve it so far.

Just noticed it as well, occlusion culling is extremely slow, please help!

Chiming in to say I’m facing the same problem, especially when I have a lot of landscape grass or foliage in the map.

An option may be to use the Hierarchical Z-Buffer Occlusion. This uses a compute process rather than the hardware queries so can perform better. However it is more conservative so may cause more to be drawn offsetting any savings of the method. You can enable it through the console (it’s not available through the settings window, but can be placed in the render setting block of your DefaultEngine.ini):

r.HZBOcclusion 1

The general reasons for occlusion taking up time is the scene rendering a lot or just the sheer number of occlusion queries. If you use:

stat initviews

You can see a breakdown of what is going on. The Unreal docs covers it all pretty well, so worth going through that if you need to delve more deeply. It’s really down to the nature of your project what approaches to use.

HZBOcclusion is in fact worse in my case, turning it on add 2ms cost to culling.

In which case it’s probably the sheer amount of primitives going through the culling. The stats and using the debug should highlight what’s going on.

As you state you are using foliage and grass setting up a cull distance volume and also setting up culling for those smaller meshes in foliage mode will probably help. The distance culling will be done before the occlusion culling so it would reduce the number of primitives going through that.

What’s interesting is that as I convert a whole bunch of individual static mesh actors into a ISM/HISM actor, the culling time got worse! Time spent on Occlusion Culling went from sub milisecond to above 30ms, with the exact same visual. Now this leads me to think that perhaps ISM is the culprit here, at least in my case.
To add on to this, I unticked the “Use as Occluder” box in the ISM component, but that did not make any differences to the time spent doing Occlusion culling.

That does sound like a chunky increase, which definitely sounds odd. With HISM there is an extra cvar to try for the occlusion. Foliage also works as an HISM, so it will affect that as well.

The cvar to try is (it defaults to 1):
r.AllowSubPrimitiveQueries

The comment for it is:
Enables sub primitive queries, currently only used by hierarchical instanced static meshes. 1: Enable, 0 Disabled. When disabled, one query is used for the entire proxy.

Also if you are using masked materials for your foliage etc you may want to ensure that it’s going through the prepass (ie writing to depth). Also worth enabling the masking in early z pass as it can reduce the render cost.

I would suggest playing with the distance culling though just to reduce the amount getting pushed through the occlusion process.

I tested this Early-Z pass setting, did not help. There is another thread regarding the same issue, where they suggested a few other possibilities, none helped. Tried to reproduce the Static Mesh vs ISM again, can confirm that ISM is causing the problem for me. While the RHIT did not go up by a lot (except for a brief moment when it shoots up), the Occlusion culling time goes from 0.5ms to 23ms, sometimes more sometimes less.

The usual reason for occlusion cost spikes is the cost of overall rendering spiking. As the occlusion queries rely on getting a result from the GPU and they are a frame behind. If they are not ready at the point they want to be read they will block (you can make it be more frames behind, but I don’t recommend doing that). This will cause a spike on the CPU side, however the problem is actually on the GPU side.

It may be that some mesh/material is coming into view that is more expensive to render than you expect. You can get a reasonable amount from the inbuilt tools or use external tools like Pix & RenderDoc.

If you run:

r.rhisetgpucaptureoptions 1
profilegpu

In the editor it will throw up a dialog breaking down what is in your scene (note that the capture options disables some of the threading, so may be a bit slower). It should split the render calls by material so you can identify if a particular draw is causing the spike. It also dumps a text version to output. It’s more detailed than just using ‘stat gpu’ so should hopefully be helpful in identifying the problem.

This is all fairly generic as I know very little about your project etc, but might shed more light on what is causing the seeming jumps in occlusion cost.

So I gave that profilegpu command a go, and unfortunately it does not reveal anything related to occlusion culling at all. The extra frame time somehow get distributed evenly among all stages, although shadow map does take a bigger hit than others.

I also cannot use something like Distance culling volumn, guess the ISM actor would have a very large bounding box due to the scattered nature of the instances.

Occlusion culling cost wouldn’t show up in the GPU profile. The CPU side would just end up waiting for that results. So slower render overall would cause the stall. So if you compare when it’s in time to when it spikes to 20ms if chunks of the GPU are increasing, but no clear culprit it is probably just sheer volume - you should be able to see number of draws and instances for each (as long as you set the capture options or enable material draw events otherwise it will just be a block cost per pass)

The cull distance would be per mesh instance not on the whole object. If you are using foliage you’ll need to set the distances on the meshes in the mode. Also if shadow cost is noticeably increasing disabling shadows on the lowest LODs of your foliage may help with little obvious visual difference as well as turning off any WPO and just making the materials cheaper.

The occlusion culling may not be the root cause. If you’ve got a SkyLight within your scene using Realtime Capture, it may be that the Cubemap Resolution you have set on that is high enough that it’s taking a significant amount of time to capture that cubemap (this occurs every half a second or so), and the occlusion culling is held up by that causing it to present as occlusion culling hitching when you profile.

If you have one, try reducing the cubemap resolution and seeing whether this alleviates your hitching.

Cheers,

Alan.