Instanced static mesh slower than Non-Instanced static mesh

I was just stress testing the engine and wanted to see how much i could profit using instanced meshes. Turned out i could not, because my framerate was half when i used instanced static mesh.

i first duplicated a static mesh in the viewport till i had 1000 meshes,
then i had a construction script to add 1000 instances,
non instanced static meshes ran faster than the instanced static meshes i distrubuted with the method below.
i added a boolean to add non instanced static meshes inside construction script too but the result did not change.

I basically connected a couple blueprints to calculate locations needed for my meshes to be in a square in the construction script,

then i added a boolean to test these 2 conditions.

  • Add instances to the instanced static mesh component with given locations
    (1000 instances inside 1 instanced static mesh component in 1 actor)

  • Add static mesh component to the actor itself with given locations
    (1000 static mesh components in 1 actor).

you can see on bottom right of both images, in details panel, “instanced?” boolean is enabled and disabled, and you can see the difference in fps.

my static mesh contains around 13k verts and both instanced and non instanced static meshes were 30 million triangles each

So Non instanced static meshes ran faster… what is the purpose of instanced static mesh if non instanced is better? isnt there a faster way to render non instanced meshes by instancing properly?

my hardware is
AMD Radeon RX 590 Nitro GPU
AMD Ryzen 5 2600x
16GB RAM

Please ask me if you need any further explanation or testing. But you can replicate the same issue with configuring a similar blueprint and have a random 13k vert mesh.

EDIT: HISM With no LOD’s are also the same with ISM’s

More meshes with simpler geometry seems to work better on instanced meshes


I do not use nanite because it also impacts fps alot.

Hi,

Try running your test as “StandAlone Game” and look at the performance there - you get some wild variations from the editor.

Yeah i have indeed tried that because i also had another issue when played in editor, triangle count was doubled but on standalone it was fixed, this one though doesnt change, i can also just continue using non instanced but i really want pros of batch rendering to make my project faster.

It looks like you’re creating a lot of ISMCs - you should only need one for each unique mesh and just be adding the instances to that one. Instances are a lot faster than SMs - once you’ve got it set up correctly here you should see a big jump in performance.

No it actually doesnt create multiple instanced static mesh components, it is added after the for loop is Completed, here i updated it a little bit more so it is easier to see;

And here is the results captured with this actor:


Dont mind the static mesh triangle count, it is not showing the real numbers, on standalone they are both 30 million triangles.

I wish it was an error in my code because i really wanted to make use of batch rendering.
Also in my post, there is a section i mentioned low poly objects actually work better with instanced static mesh component, and it worked with the previous code, so it doesnt add up anyway.

I just reproduced your blueprint (just added SetStaticMesh to both the ISMC and the SM) and tested the results - I’m seeing the results I’d expect:

Edit: Have you tried with different meshes? - sometimes I’ll find my whole level slows to a crawl only to find it’s one mesh that’s causing the problems.

Thanks for staying with me in this topic but as i stated 2 times before, my static mesh has around 10k verticies, and i also approved my scene was faster with up to 1k vertex meshes when using instances. My mesh also has 2 material slots if that matters with drawcalls in instances.

Edit: total vertex count doesnt seem to matter,
so if i have 60 million verticies in the whole scene with instanced cubes, it works way better than 30 million verticies in the whole scene with instanced custom 10k vert meshes.

I’ve done a lot of testing of ISMs/HISMs and I’ve never had a drop in framerate by using instances (unless there is a problem with a specific mesh).

Your gfx card isn’t the most powerful so it is possible that it’s bottle-necking - do you get similar results if you reduce the number of objects?

Are you using DX12 or DX11? DX11 can help improve performance sometimes.

i can certainly confirm;

  • How many verticies the mesh has definetely matters and instanced meshes run way slower with higher poly than non-instanced.(10k+ vertex meshes) i would like you to try instancing a 20k vert mesh or maybe more.

    It doesnt matter what “specific mesh” it is, i tried lots of meshes of all kinds, its all polys that matter.


  • I was using dx12 but today i tested dx11 and instances are still slower.

  • Instance Count doesnt seem to matter, generally its the same fps, instances are only better when the number of instances exceeds 1000. only works with a simple mesh like a cone or cube. With higher poly meshes, it doesnt even get close to good.

  • Material count doesnt seem to matter(i tried 1 and 2 material objects with same vertex count) and the gap was same with fps.

I agree it can be a bottleneck with my gpu but i would like you to try to reproduce the same bottleneck by trying higher and higher poly meshes since at one point any gpu should experience bottleneck, also i am not using nanite so if you are using it, try it without it.

But still Bottleneck doesnt sound very possible because my non-instanced meshes still run better than instanced ones, i will try it with a laptop with 3060 in it, still not the best gpu but i have to package the project to test it.

This is the test mesh I used, 25K verts, non-nanite - it seems to be a good all-round test mesh for things like this:


I’ll try something with a larger tri count.

Edit: I was able to get one scenario where SMs outperformed ISMs - non-nanite - mesh with 150K verts - 3200 meshes on screen - zoomed out - lumen turned on - changing any of those flips it to ISMs being quite a bit faster.

Very interesting, i packaged my project to test it on other computer, will let you know the results but the fact your test broke only after 150k verts shows inconsistency with the engine no matter what my specs were,

this means unreal engine is not good for scalable platforms and even when you think you are helping your players, you can actually harm them with the fps.

There should at least be a warning about it or a workaround. Im not sure of what to do now…

Since most gamers dont have extreme setups, this is a worrying discovery. I believe my setup would be the average among all players.

Then you might say, if they cant buy a pc you cant sell high price games anyways, and you would be right. It is still sad

That was very specific circumstances - I wouldn’t use Lumen with non-nanite etc.
When the tri-count gets higher, that’s where Nanite comes in - switching those meshes to Nanite makes the performance rocket.

As far as supporting hardware goes - it’s very much a fast upward trend for RTX cards, older cards are still in the top lists but won’t be for long.

Edit: Also, when you’re using LOD based meshes - use HISMs rather than ISMs - ISMs switch the LOD for all instances to the closest instance - so you’ll find all instances are at LOD0 when you’re close to at least one of them.

https://store.steampowered.com/hwsurvey/videocard/

Well, i dont think this would run better with both lumen and nanite disabled too, at least in my config. so it is not “very” specific, they are just disabled, although i would also like to see results from unreal engine4, sadly im not in a situation to try that yet.

And sadly nanite also works good only on high end gpus. And with proper game models, you wont need nanite ever, it is still not very ready for production on games, but more for offline renders.

Older cards wont be on top lists but i am afraid if it will be the same or maybe just a little better techonology with new names that are not enough for nanite and instancing replacing them. Because they can optimize older cards but adding new hardware will still cost them and that will reflect to the prices. And some areas of the world wont have access to those very quickly, for context, im not from a 1st world country and therefore my clients are also not. but this is not true only for me, there are many people that wont be able to afford very high end pcs. My PC is not considered low end at least in my region.

What GPU did you have for comparison?

I am also low key thinking that this could be fixed and it is an engine bug, because how instancing works is pretty simple, instead of getting the values from the mesh 1000 times, it will get it once and render 1000 times, how could this bottleneck if getting 1000 values is not bottlenecking.

That was on a RTX4070.

I understand that. You may want to test 4.27 - in fact I found 4.25 to be one of the fastest engines - also use tri-counts appropriate for the hardware - just force the meshes to LOD1 or LOD2 for testing.

Instancing still uses bandwidth - you just don’t have the texture and shader setup costs.

Oh, that’s another thing - make sure to turn off collision on your meshes - that adds all the collision calculations to the instancing.

Did you use HISMs rather than ISMs to avoid the global LOD changes?

This is probably true because it is not the first time i hear about this. I started using unreal engine with 4.26 if im not wrong. But still i cant make a decision if i want to remove all these features of unreal engine 5, because the game i am making will look generally good, i just want people with lower end pc’s to enjoy it too. thats my only reason im searching for this extra work. And it will boost the sales too but im planning a cheap price since im solo.

I tried enabling HISM but with no lods, but if non instanced meshes run better on ISM, Non-instanced should still run better than HISM, because it is generally the same, it is just instances grouped differently.

I will still try HISM with LOD’s And no Collisions after my test on a laptop with 3060.

1 Like

Update:

GPU’s have a big role in Instancing:

(finally tried the 3060 laptop)

  • A Desktop with AMD rx 590 runs better without instancing,

  • A Laptop with 3060 gets the same fps when using instanced meshes or non-instanced meshes

  • A Desktop with 4070 runs better with instancing

    When Using a Mesh with 13k Triangles, And if you are using a mesh that has less
    verticies, you can have up to 2 times more fps with instanced meshes than non instanced.


GPU Market doesnt seem to change soon (depends on your area)

The problem doesnt seem to be over for some game developers because even though a 3060 especially on a laptop isnt that better than my rx590 setup, it can still run large amount of games on high and even ultra settings, Therefore many people wont feel the need to upgrade even more than that. At least not soon

You should still consider those hardwares on development stage.


Instances existed before high end GPU’s ( i believe ) So what is the problem now?

i also think instancing were used in other games in the past that many people with lower end cards definetely played.

my suspicion about an engine bug still persists but i will try to find a workaround about this topic, i did not build the engine source code myself before but maybe if this thing gets me too crazy, i might dive into the code to see whats going on, who knows :smiley:

I tried disabling the collisions on instances and it didnt seem to help.

I was thinking if my foliages would also run better if i didnt use default instanced scattering but they dont have necesserily that high polys, at least some of them.


Nanite might not be that bad:

I also enabled nanite back and now the threshold for my pc went from 13k verts to run bad to around 250k verts to run bad, the instance count is the same.

Interestingly this time i also get double the fps on nanite with that same 13k vert mesh, from both instanced and not instanced, but that double fps is reduced when i get close to the meshes, still better than no nanite.

Now the problem with this test is there is 2 nanite enabled objects on my whole project and only one is showing up at a time, i suspect more meshes of different kinds of nanite would run poorer(?)
i remember having a scene that ran poorly with nanite and when i profiled my gpu, i could see all the resources used. I dont exactly know that if it would run better if i disabled nanite and optimized some of the models . Now i am thinking probably not but number of nanite objects is definetely something i need to test the impact of.


When using 250k vert mesh, non-instanced meshes are still superior on my pc Even with nanite. it doesnt seem to affect alot until a pretty high amount of verticies though.
So this topic is still a problem. For lower end pc’s.

What does that mean? since even a 4070 had a bottleneck with about 150k verticies without nanite, and nanite indeed increases that vertex threshold quite a lot, it is safe to assume if anybody uses a couple million vertex meshes, they might have better performance with non-instanced meshes!

  • Since nanite already uses auto lods and instances, i think HISMs with lods will also perform weaker , i will still try it soon anyways.

I think that once you have your LODs set up well, you’re using HISMs instead of ISMs and you’ve turned off collision you’ll start seeing some better results.

Regarding Nanite, yes it has an expensive set-up cost, but once you’ve lost those millis you can just keep throwing more instances at it and it doesn’t slow down.

I was using instances to great advantage on a GTX980 for a long time.

Here’s an example of a GTX1080 displaying 6.7 million instances:

1 Like

Yeah nanite and lumen are incredible technologies with just a bit of expensive setups.
I think i will keep going with whatever i find suitable at this point. Thanks for staying with me.

2 Likes

thank you for this info , because when i test instance mesh and static mesh performance ,i have same issue