Instanced Static Mesh Decreasing performance!


I’m trying to create procedural generated level content. And doing so i have learned that i you are planning on repeating meshes several times one should not use Static Meshes, but instead you should use Instanced Static Mesh and create insantces of that mesh. But when i tried this way instead of just static meshes i got heavily decreased FPS?!

I thought i did something wrong, so i started a new level and followed the first 9 minutes of this video from Epic:
But i still get very low FPS.

It turned out that there is something that is called “RenderQuery Result” that takes a whole lot of time:

This is the actual blueprint code that I’m using:

Where Floor is an Instanced Static Mesh Component with a suitable mesh.

Obviously there must be something that I am doing wrong, but can somebody help me here?

Well, not to sure. In my experience, if I have over 1500 instances with 1 mesh, it causes lag when trying to remove, add a new instance. Maybe you have too many instances per 1 mesh?

Thanks for your reply, but unfortunately this long execution time for “RenderQuery Result” applies for me even if i only have a few instances (Size X=Size Y=2)

Where’s Michael Noland when you need him… :wink:

That does look insanely high. How many materials are applied to that Static Mesh in particular and how complex are they, or are you purely following the tutorial video?

I am not purely following the tutorial video since i am using another texture. There is two materials, one metallic and one non metallic. They are both quite glossy, but otherwise they are not complex. The mesh is fairly complex: 840 triangles and 691 vertices. The stange thing is that if i am using static meshes instead i get better performance, no matter how big/small i make the grid…

Would switching to C++ improve the performance? My thoughts is that it does’nt matter, since this mesh genedation is done before rendering start. Am I right here?

How are you setting your Floor Variable?

In the original code, it was an array that were created an set by blueprint code by “Set Static Mesh”, but in the example that i posted here, Floor is not a variable but a component which is set directly in the blueprint editor.
It might be that this way is slightly better, but still not good.

Do this:

Add static mesh. Put the return value into an instanced static mesh variable, and plug that variable into your add instance.

Click on the add instance node and see if your instance is set to static or movable. Moveable is default I think, set it to static and see if that helps; and let me know.


Thanks for all good suggestions.

Tried this, and for both cases (movable, static) it was more efficient to create Static Meshes instead of instances of Instanced Static Mesh.

I did kind of as you desrcibed in my first try with instanced static meshes (before i fell back to the tutorial).
But I didn’t do it exactly the way you described and I will try that in a couple of days when i have access to my UE4 environment again.
Meanwhile, is there anybody out there that can reproduce the problem by implementing the small blueprint above and analyse the result of “stat scenerendering”?
I will also test the obvious thing myself: Use a standard box instead of the mesh, in case there i something that is not good with the mesh i am using.

Although the main question remains: How can it be more effective to use individual static meshes than instanced static meshes under the same circumstances?

I can’t really get a grip on the problem. I think one needs to know what UE4 does under the hood during this “RenderQuery” time to sort this out.

Instanced Meshes reduce draw calls, or the number of separate cases where the CPU has to talk directly to the GPU.

Depends on your PC, but with a mid to high end computer you should be able to pull out at some tens of thousands of basic instances without having issues. I can pull out 100-200K relatively non-optimized tree instances before getting to the FPS you are experiencing.

The figures is from a machine with a Core i5 (16GB) and a Geforce 650Ti card (1GB). In the example above i used some 4k of instances. With 1k of instances i get around 30 fps, so obviously something is not working well here. As i mentioned earlier, using static meshes gives significantly higher FPS (but i takes the blueprint a very long time to create them).

According to that stats list, you are heavily GPU bound. When you see a lot of time in Present or Query, that means it had to wait on the GPU to finish what it was doing.

Note that using InstancedStaticMeshComponent defeats CPU side culling, so if you have 4K instances but only 5 are visible, then you’re still paying to draw the other 3995 of them. How many triangles is each of your instances?

Michael Noland

1 Like

Each instance should be around 840 triangles and 691 vertices.
Thanks for you answer, this was all new to me.
So using instances is then not recommended when a lot of them are not visible?

But… in the example above all (or most) of the meshes are visible. So why is it then more effective to use static meshes?

I made two test cases. One using the editorcube mesh, and another using Sphere (StaticMesh’/Engine/BasicShapes/Sphere.Sphere’). That sphere is 960 tris and 559 verts, very close to your model. Keep in mind my video card is a beast (Titan).

  1. Cube:

Was able to spawn 100,000 instanced cubes and the GPU time was 8ms

  1. Sphere:

With 5000, GPU is still at 8ms.

With 15000, GPU time is ~28ms. That is 28 million triangles coincidentally. My renderquery time was only 20ms.

What video card are you running?

You mentioned that renderquery was still just as long with only 2x2 (aka 4) total instances, right? Is it actually the same number or just also high?

If you can give us the renderquery time with only a few instances that will be helpful. If there aren’t many instances there should be no reason for that number to be high. If it still is really high Id suggest making a new test BP in a new level. Never hurts to rule things out. Maybe old instance data somehow was hanging around somehow.

The card is a Geforce 650Ti (1GB).
The times were much shorter for only a few meshes, but still worse for instances than ordinary meshes and i don’t need more than a few hundred meshes before the fps drops to critical levels.

I will repeat the experiments in a new project and with different meshes on this weekend (on a business trip right now).

@RyanB: Do you get better or worse performance if you replace the add instance with just an ordinary add static mesh? I suppose you cant try that on 15000 meshes though so it might get difficult for you to test …

Instanced Static Meshes cost more on the GPU than regular static meshes, but cost less CPU time to process and submit them. However, my understanding is that they only cost a little bit more, and only in vertex time, not pixel time.

If all or most instances are visible at once, then they’re useful but if most are not visible then they’re not a good fit (but can still be used in spatially local regions, e.g., instead of 64x64, do 16x16 regions for example to balance culling with batching).

RE: Your actual situation, 4k x 840 is a decent number but not totally insane for a modern GPU (3.36 M tris). However, I’m wondering if they’re some how accumulating and you are drawing way more than you think you are; maybe throw in a ClearInstances call on the component before the for loop, and also make sure you only have one of these actors in the level.

Michael Noland

Once again, thanks all for the help and information.
There is a lot of test suggested in this thread that i will carry out as soon as i am back.