How to find rendering bottlenecks?

If you have a scene and it starts getting slow, how exactly do you find which meshes or materials are causing the bottleneck?

Looking for any general performance tips of tutorials on engine settings, mesh settings, material settings, stat commands, debugging, console commands, whatever on how to make scenes render faster.

The main factors for performance are draw calls, triangle count, and shader complexity / overdraw.

Draw calls are instructions sent from the CPU to the GPU, telling it what mesh to render where using which materials. This is a potential bottleneck, because sending the data from the RAM to the VRAM takes relative ages, compared to execution on the GPU. To check if draw calls are an issue, use stat rhi. At the bottom, it will show you the draw call count. For high end PC, you can get away with about 2000 to 4000 draw calls. For mobile, you can only have 100 to 200.

You reduce draw calls by merging or instancing meshes. Unlike Unity, which will automatically batch static (and small dynamic) meshes together, Unreal needs a new draw call for every mesh, even if it is using the same material. Look around your scene, and when you start seeing large amounts of draw calls, that’s where you have a problem. If you’ve placed a lot of the same mesh, or have a bunch of small meshes together, you should merge them into a single mesh. Select the actors, right click on them, and then click “Merge Actors”. You now have three tabs: The first tab is a straight up merge of the meshes. Unreal will generate a new mesh which simply sticks all the selected meshes together. It can even merge materials if you tell it to, to reduce draw calls even further. The second tab is for proxy mesh generation: this will create a wholly new mesh based on the meshes you have selected. You can think of it as a shrink wrap. The polycount will also be greatly reduced, and materials will also be merged. This is a very powerful tool to tackle poorly performing bits of your scene. The last tab is instancing: this is only useful if you use a lot of the exact same mesh & material within the same area. It will replace your individual meshes with an instanced static mesh actor. Instanced meshes will be drawn in one draw call, but be careful: it’s possible that your meshes will loose their ability to LOD (at least that’s what I’ve encountered, not looked into it further yet), possible reducing your FPS more because of the increased polycount.

Epic invented the HLOD system to drastically reduce draw calls, making it possible to run Fortnite on mobile. The HLOD system takes a lot of the manual labor away from you, by analyzing your scene for you, deciding which objects should be merged together in proxy meshes, and creating several levels (hierarchies) of them, to swap in between.

You can also reduce draw calls by culling: simply not rendering objects is always cheapest. Make sure there’s geometry to block intense areas from view, or use cull distance volumes to stop rendering small objects at a certain distance. If you have a lot of glass, a good trick is to replace the translucent glass material with an opaque one in a far away LOD. This will then allow Unreal to cull all the objects behind the glass automatically. To smoothen out the transition, use a depth fade on the opacity of the translucent material.

The second pillar is polycount: if you starting seeing many millions of tris drawn on high end PC, or tens of thousands on mobile, pop into wireframe view (alt-2) and see what objects are very detailed. You can also open the statistics window (Window > Statistics) and take a look at the Primitive Stats. This will give you a very clear breakdown of which meshes are poly intense. Unreal can automatically reduce polycount for you. Just open up the mesh’s details window, and select a LOD Group. Unreal will then create a number of LODs for you, and assign screen sizes at which to swap them. You’ll do well to check these though (move back and forth in the viewport, see which LOD is being used), since sometimes the screen size given to LODs in this manner is not very well spaced out.

Lastly, shader complexity. Dip into shader complexity mode (alt-8), and look if there’s a lot of red or white around. If this is the case, you either have a lot of overlapping translucent materials, or your shaders are too expensive.

Other useful stat commands are stat gpu, for instance to check if it’s shadowing that’s killing your performance. Stat unit will tell you if you’re CPU or GPU bound (but you will almost always be GPU bound).

2 Likes

Wow. Lots of good stuff there. Thanks! :cool:

BTW, was looking at pre-computed visibility. Will that work for virtually any scene? In particular we are doing realistic ship engine rooms - very densely packed rooms with many different meshes/materials. Will precomputed visibility work for that?

Precomputed visibility works best for scenes with multiple rooms / walled off outdoor sections. If you have a single room which is very densely packed, you won’t get much benefit from precomputed visibility, as most of your meshes will be visible all the time. If you have several rooms, each with many meshes, then yeah, you should be able to benefit from them. You can read all about precomputed visibility here.

In your use case, merging and reducing mesh complexity is definitely the way to go. For instance, if you have lots of bolts all as separate static mesh actors, merging them together to a single mesh would benefit your performance greatly. Just don’t go overboard - only merge meshes which will be visible together at the same time, i.e. in proximity to each other.

Any idea if merging meshes is doable from code? Most of our world objects are C++ with components added via code.