Low Framerate using (Hierarchical) InstancedStaticMeshComponents

Hello everyone,

I am struggling to get a decent frame rate on a crowd scene. My idea is to use InstancedStaticMeshComponents to lower draw calls and save some rendering time & cpu<->gpu bandwidth.
Originally, we were using skeletalMeshActors. Here are some numbers about both ways :

  • spawning 1050 skeletalMeshActors of around 16k vertices (the unreal character decimated at 20%) with a custom animation node / instance to animate them, leads to 6 fps and 40/45k draw calls
  • the same 1050 actors converted to 17 rigid static meshes animated per instance transform (1050 transforms x 17) led to 107 draw calls, but the fps is not that better (9-10) , I was expecting a much higher

With Stat SceneRendering Command, for instanced meshes, the RenderQueryResult is eating all the render time, and when looking at the GPU profiler, this seems to be shared evenly between four items :

  • prepass (all opaque)
  • base pass,
  • shadow depth
  • light

The time I have spent until there looks quite useless for such a low benefit, and I was expecting a far better improvement due to instances.

How could I improve this further ?

Best regards

Sebastien

What does your “stat collision” look like?

We had a few scenes with a ton of objects getting low FPS despite draw calls not taking long. Simplifying collision and turning off collision for some many objects increased FPS a lot.

Hi, some things you could try

(1) make sure that you’re GPU bound (if you haven’t already done so). So set the screen percentage to something like 1 percent. If you still have the same low framerate, then you’re not GPU bound and can look into the game thread and draw thread

(2) apparently with 107 draw calls you’re not bound by that anymore and if you only have 1050 opaque actors and otherwise an empty map then I don’t think you’re bound by pixel shader. So I would try the vertex count next.

So as a test, you can try reducing the vertex count from 16k to something like 160 and see if/how much that increases your frame rate. Then depending on that result use more aggressive lods or a generally lower vertex count.

Hello and thank both of you.

Schmoopy : I already disabled collision as I am replaying a crowd simulation animation, else that would have been a good candidate indeed.
chrudimer : your second guess was correct, I dropped my char from 16k to 1.6k, and the gap between skeletalMeshActor became much more noticable ! can achieve something like 70 FPS on the 1050, with 1k characters. Still have to find a bit more performance, but that confirms that this limit was mostly asset-bound !

So the instanced mesh limits the drawcalls, but cannot make any magic against too much vertices to draw.

Thank you

Yeah, you reduce drawcalls but you’ve got less culling and therefore the GPU needs to process even more stuff. But then if you would have thousands of instances and be doing dynamic occlusion culling on all of them it would be a strain on performance anyway, so its always pros and cons.
[HR][/HR]
As for the vertex count, you could use a hierarchical instanced static mesh and use agressive lods to reduce the vertex count further, if you’re still bound by it (but 1.6k at 1050 actors should be ok imo).

If you still got performance problems with shadows, then maybe try using mesh distance fields for the shadow casting and reduce the dynamic shadow distance of your directional light and use distance field shadows instead for objects that are further away.

Also you could make sure that you’re still GPU bound.