How are you breaking up the instances? most likely you ended up rendering far more polygons by making all of the meshes one big instance. that means if you see any single mesh, ALL triangles will be rendered.
Before you did the instancing, there were more, smaller draw calls. As long as the system has the bandwidth to handle those draw calls, instancing may not help much. But if you have too many draw calls (such as in the multiple thousands), then you will benefit from some instancing as long as you don’t cause too many additional triangles to render. It is all about finding a balance point. No one instance should be huge.
I recently made a similar BP that lets you change the ‘cell size’ and redo the instancing breakup. I will try to post some info on it later since it may help.
The biggest problem that I have found with instanced meshes is that the individual instances cannot have a different negative drawscale than the base instance actor. The instance actor itself can have negative drawscale but there is a bug that will cause those to lose their static lighting after a rebuild. If you try to flip an instance’s drawscale, it will cause the normals to flip and the polygons to face backwards.
So before you get too much further down the instance path, are you using static lighting? And if so, can you remove all negative drawscale from your meshes? It will make your life much easier if so.
Instancing is simply at tool to reduce size of a draw call, at the cost of always drawing all triangles of all instances, whenever just one is needed.
For example, 1000 triangles, 100 objects, individually drawn, would create 100 draw calls, each passing 1000 triangles to the video card, but these would be drawn when needed and occluded individually.
In instanced setup, this will be drawn in one draw call, passing 1000 triangles only once to the video card, with additional 100 transforms (loc, rot, scale) of other instances. But video card will have to draw all of them each time any of them are visible (or could be visible due to shadows, etc).
Overall data passed from game to video card is roughly 100x smaller, but number of triangles/vertices required to draw would usually increase.
It is good idea to identify if number of draw calls is a problem, before starting to use instancing.
Instancing could use dynamic number of instances just fine. I have done this with DX11 and openGL. This must be UE4 renderer design decision and I just would want to know rationales behind it. Battlefield 3 DX11 version uses instancing for everything. It’s just batch all instances of same object(with same material) and render those with single draw call. No polygon wastage and no excessive draw calls. http://www.dice.se/wp-content/uploads/2014/12/GDC11_DX11inBF3_Public.pdf page 27.
Instancing with UE4 just feel’s very cumbersome. You have to manually decide which stuff to instance and same time you need to balance this with polygon counts and chunk sizes etc.
You may want to look into using hierarchical instances which do behave a bit more like the instances you are expecting. Hierarchical instances are meant for cases where you are handling hundreds of thousands to millions of instances. The regular instance system is more meant for lighter weight instance work. You are right that a system that dynamically batches things would probably be better. That is similar to how instanced stereo for VR works also.
Do you know is there any technical reasons that would prevent UE4 to use dynamic batching? Is current system using just single frustum and occlusion culling result for all instances?
It’s not that anything “prevents” it from happening, its simply not how the system was designed. Somebody needs to do some programming work to make it happen.
Yes, the regular instance system does frustum and occlusion culling for clusters as a whole. The old foliage system automatically broke up the instances based on the specified “max cluster size” and or “max instances per cluster” in the foliage settings so you would have clusters of a certain size, not just one cluster for the entire level. But for truly dense forest scenes there was never a good combination of settings, you either had too many clusters (and drawcalls) or too many trees per cluster which meant too many polys rendering and terrible up close LOD granularity.
The new hierarchical instance system was designed to address that and it fixes other issues such as the previous instances LODing based on the combined cluster size(which often meant clusters way too big) rather than the size of individual instances. I suggest you experiment with using hierarchical instances if you are using tons of instances.
I have tried with “hierarchical instance system” and it’s increased poly count by 25% with only minor CPU savings. It’s also increase memory usage. Not really happy with result at all.
I too would like to see some stats about this. Unfortunately I’m a total blueprint noob at this point and I can’t recreate any stress tests. I would like to see comparisons between regular instance and hierarchical against no instancing at all. I’m particularly interested in seeing results when the draw time is the bottleneck. I personally don’t care too much if my gpu time increases slightly if that helps me with my draw time, which it should.
Ideally it should work behind the scenes like it does for Battlefield 3, of course, and a technical artist or designer shouldn’t waste their time setting this up. Right now it seems like a hidden and obscure feature and it shouldn’t be!
I think this is the expected results. Keep in mind that in your 1959 mesh drawcall example, your draw time is 2.01ms.
In the instanced example your draw time is 0.43. That is a huge improvement!!
Of course it comes at a slight cost in GPU time. From 12.46 to 13.60. (There also seems to be an increase of Game time, not sure about why that is happening.)
This amounts to a decrease in overall performance. But you shouldn’t be using this if drawcalls are not a bottleneck and again. Like I said, in this example you are not Draw time bound.
You should recreate a more realistic example where the Draw time is the bottleneck and not the GPU or Game time. Increase the amount of chairs until the draw time increases beyond the GPU and Game time. You should see an overall improvement in performance in this case when using instances. Keep in mind that by using Instanced Static Mesh Component all meshes will swap LODs at the same time and they will cull as a group as well, meaning your GPU time will increase considerably as you increase the number of chairs. Hierarchical Static Mesh Component should offset this increase as it is capable of culling and swapping LODs at a per-mesh level.
By the way, I just noticed you have “stat scenerendering” turned on. Keep in mind that just having the stats open window can badly influence your milisecond time. A better performance comparison would be with just “stat unit” turned on.
How can I convert static mesh in the blueprint to instanced static mesh. I am having the same issue, I have over 5000 instaned objects in my scene which have 20-30 parents. ? I have tried everything I can but no luck
I have 5000 static mesh in a blueprints which are instances of 50 pieces. I am trying to convert them to instanced static mesh but can figure out how to do so. I want to drop the draw calls number from 5000 + to 50. If I can swap static mesh with Instanced static mesh than It will happen but I don’t know how. If you can help me than It will be very helpful. I have been trying for days now but no Luck.
Thanks.
Ninu