Thanks. I wrote my own engine from scratch before switching to UE4 and I implemented instancing using my own custom vertex buffers. Their HISM implementation is a good step beyond what I did, but my instancing with custom vertex buffer code and shader code was extremely lean. Anyways, I’ll try to weigh in a bit more on a point by point basis here.
I created my own tiling static mesh blueprint which has a drop down menu where I can toggle whether I want it to be a static mesh, an instanced static mesh, or a HISM. The implementation was done in the construction script and the node graph is very similar across all three implementations. Once you got one implementation down, implementing the other two is about a 5 minute job and requires relatively little thought.
Generally, I tend to implement first, look for performance bottlenecks, and then optimize if I find any. You don’t want to prematurely optimize something, particularly if you can’t measurably say that it actually is a performance bottleneck. In some cases though, it’s a pretty plain no-brainer: 200,000 triangles vs. 21,800? The GPU still has to process every vertex, even if its a half pixel on the screen due to distance. The more you can lighten the polygon count in areas it doesn’t matter, the more budget you have in areas that do matter.
So, you really only start to see huge performance gains with instancing if there are lots of similar meshes to actually instance. I wouldn’t waste my time instancing 10 meshes, but when you start looking at 100’s and 1000’s, you start getting interested. In the case of rock formations, you’d want to make a lot of them to make it worth instancing. One other area you could start looking for performance gains is also in memory pools. Rather than constantly allocating and deallocating memory on your heap, which gradually gets fragmented, you can create a contiguous array of memory which recycles objects. If you combine memory pools with instancing, you can get some blazing fast performance. Don’t quote me officially on this, but I think merging actors isn’t really the same comparison to make with instancing (you may have multiple static meshes being merged, which all need their own draw calls unless you can instance batch those on the backend).
I don’t think there’s a difference. All the instancing does is do batched draw calls on the GPU, and this increases frame rate and reduces polygon counts. The physics stuff is generally still going to be on the CPU side, and even if you use the NVidia Physx stuff, I don’t think it makes a difference in terms of performance. Someone smarter than me on the Physx coding side could probably correct me, but I generally haven’t gotten far on the GPU specific API calls. You can see the use of instanced static meshes on the GPU particle emitters though: Each point sprite is just 2 triangles which get instanced and the physics collision stuff is done with NVidia Physx calls on the GPU. Keep in mind that the GPU side physics collisions are going to be a lot more limiting in how you can respond than what you can do on the CPU side though. Once data gets down to the metal in the graphics card, there is no going back up to main memory to poll a variable value, so … those flaming sparks particles probably shouldn’t ignite that flammable straw on the GPU side.