For the past month I’ve been involved as a software engineer in a AAA project that involves a very complex scene with thousands of static and skeletal meshes rendering at once in VR. In order to optimize this scene, which sadly cannot benefit much from occlusion culling and other such practices, our designers have been making extensive use of Unreal’s mesh combination tool, as well as the instanced static mesh components (both hierarchical and non-hierarchical varieties). Moreover, my primary task for the past month has been to implement an Instanced Skeletal Mesh Component into the engine so that we can take advantage of instancing for many of the characters in our game (think for marching bands/armies, or large crowds of people).
Anyway, I’m happy to report that I finally succeeded at this task last week: the new Instanced Skeletal Meshes (ISKMs) are only skinned once per component and only a single draw call is issued to render them on screen. Unfortunately, our testing revealed some strange performance results from using the meshes: when rendering in OpenGL, they yield an impressive performance boost when rendering large crowds of characters. While rendering in D3D… not so much. I’ll start with the good details.
When rendering in OpenGL on my work computer (equipped with an nvidia gtx 1080) I was able to achieve 120 fps while rendering 1024 characters high-poly characters on screen at once (each one had over 40,000 triangles… comes out to over 42,000,000 triangles for the whole scene). This was easily three times faster than rendering each skeletal mesh with individual skinning + draw calls, so I’m quite pleased with the results…
When rendering in D3D however, I had some pretty severe performance problems. Despite confirming once again that I was indeed skinning the mesh only once per frame, as well as issuing a single draw call for all instances, the ISKMs were rendering significantly slower than their non-instanced varieties in all of my stress tests…
A bit of profiling revealed the major source of the slowdowns. The most critical appears to be in base pass rendering. stat SceneRendering revealed most cycles there were being eaten up in some kind of “RenderQuery Result” process, while ProfileGPU stated they were being eaten by the “Dynamic” process (drawing dynamic elements?). Though neither of these facts have been very helpful in leading me to the source of the performance losses…
Another weird thing I noticed (though it doesn’t seem to affect performance) is that stat InitViews reports a lot of time being spent determining visibility (again though, only in D3D). I say that it doesn’t affect performance, because toggling occlusion culling on/off doesn’t raise or lower my performance at all; just changes the graphs around. (The above reports were generated with occlusion culling off because of this).
Strangely, I also noticed that normal skeletal meshes do not seem to significantly affect the reported occlusion culling speeds in any significant way, so I wonder why this is the case for ISKMs (and again, only in D3D).
Anyway, the main big question is: what is “RenderQuery Result”, and why is it eating all of my cycles when D3D is active?
I suspect that something weird is happening in the D3D pipeline, like materials constantly being loaded and unloaded for every instance (or something strange like that) but I can’t really be sure.
If anyone could give me some ideas about this though, I’d be much obliged.
As this is not my own personal project, I am unfortunately restricted right now in how much of my code I am able to share right now. So I cannot, for example, just zip up my entire modified engine source and project files for everybody to inspect… BUT, I am interested in receiving help to solve this issue, so my team has authorized me to share as many code snippets and questions/answers as necessary to solve the problem.
Personally, I would love if I could open-source this work at some point, since I suspect some other people could find ISKMs useful in certain scenarios… but we shall see.
If anybody has any questions / comments / suggestions / answers, I would love to hear them cheers,