Help me understand the performance numbers I am seeing

Hi, so I am trying to figure out the maximum number of triangles I can have in my scene without killing performance. I’ve been doing some tests.

First I started out with a huge 1.5 million triangle mesh and this was the result: Imgur: The magic of the Internet

Note a GPU frame time of ~72ms - very slow

So then I look a reduced version of the same mesh at about 23k triangles and made 216 copies of it for a total of 5 million triangles! This was the result: Imgur: The magic of the Internet

Note a GPU frame time that is faster than the one single huge mesh of ~53ms

What is going on here? I thought maybe lighting was to blame, maybe lighting was causing the huge mesh to degrade in performance whereas the smaller meshes were getting some lighting optimizations that were helping them out so I deleted all the lights in the scene and created a material that is just white emissive which I thought would isolate any lighting related performance problems and then I got this: Imgur: The magic of the Internet

Basically the same gpu time as before.

None of these meshes have any LODs I just imported a raw mesh with no other LODs (Unreal Engine doesn’t try to automatically reduce meshes and create LODs right??)

Anyways if anybody could explain to me why the first mega mesh is so much slower than the 216 normal meshes despite having a way lower total triangle count I would greatly appreciate it. This is on an iPhone 6s if it matters.

tris count hasn’t been a problem for game engines for a while now, what is a huge problem is triangles smaller than 2x2 pixels, it causes massive performance drops. that is one of the major reasons for lods. as the mesh gets smaller in screenspace Lods reduce the poly count and your polys get bigger (lower res mesh have larger polys) and avoids tris in 2x2 pixels
the next problem for gpu’s is draw call count and over draw

UE4 can create lods automatically but it can be done in static mesh window :slight_smile:
and yea - you can go to overdraw optimization viewmode and look why highpoly mesh is trouble :slight_smile:

It is my understanding that the tile based renderer that the iPhone 6s uses completely eliminates overdraw with opaquely blended materials, so there must be something else going on here

These meshes don’t have any LODs defined, are the LODs automatically generated? What is a screenspace LOD? How are they generated?

Xcode has quite good performance capture tools. What kind of numbers xcode show?
There is blog that explain quickly ios gpu tools.

I could guess that for some reason PowerVR GPUcannot get good utilization if all the work for the GPU is coming from single draw call. What kind of performance numbers you get if you just split that exact high poly mesh to multiple pieces that cover exact same screen area. IPhone6s is bit tricky because clock rate can be lowered because of the battery problems. XCode gpu tools should get rid of that variable clock rate but I am not sure.

xcode is crashing whenever I try to profile the game :frowning: I’m using xcode 9.2, I didn’t look into it much though I will give it another shot and see if I can find something. I’ll try splitting it up into multiple meshes too and see what happens.

So I just split the mesh up into 4 pieces and I got a good performance improvement! Check out the attachments. 18.5 to 23 fps! I’ll try profiling it later to try and get some more detailed numbers but this is good for now, it looks like number of polies isn’t really a problem as long as it isn’t in one huge draw call.

I have a follow up question though, how many draw calls are too many draw calls? I wonder if I could keep splitting this mesh to keep seeing performance gains.

typical desktop gpu’s can handle 3-4k calls. when you cut the mesh up, if any parts are hidden behind others they will be culled, id guess thats why you get frame time decrease.

The triangle count was actually greater with the cut up mesh though, so I don’t think it was rendering less, unless I am not understanding the information provided with the stat Engine command.

I noticed that you need to disable async load thread and event driven loading for iOs debugging,

Edit: I really should report this bug for Epic. Because this is default setting and it took some time to figure out that what was the problem.