Instancing rigid characters crowds

Hello,

We are looking at different solutions to improve our custom crowd nodes, based on some skeletal animation cache reading. Until now, we instantiate plain skeletal meshes actors, animated with a custom node. This consumes a lot of resources however (node count, draw calls, meshes animation and skinning), and is not as fast as we would wish for a great number of instances.

From what we found until now, one good way to improve performance would be to use World Position Offset material, and feed it with vertex animation written in textures, but as our animations are “per character” that does not suit our need so well.

We are investigating another way to keep a correct framerate, by implementing a rigid skinned character in this pipe :

  • importing the model character into different static meshes for each rigidly bound mesh
  • at runtime, instancing a main actor with several (hierarchical) instanced static mesh components (one per model mesh asset) bound to a dummy root USceneComponent.
  • for each character instance, simply add instances for each used mesh in the proper components, and feed their transform each frame in a “rigid skinning way”. The actor has the charge to keep track of transforms and there link with the animation cache, and update them.
  1. Does this pipe sound OK ?
  2. In Content Browser, how could I present a character mesh group in only one offline asset, as I could do with an actor + several MeshComponents in runtime ? Having only one asset to reference would be more convenient for customers. I have looked at the Mesh collections, but it does not seem to be meant for this purpose. Thus, I was planning to live-convert a USkeletalMesh into several static meshes when I instantiate the model character actor, but that’s not as simple as it sounds, and it would add an overhead at the first instanciation time (acceptable for us).

Thanks in advance for any advice or comment !

Just a different suggestion. Have you looked into alembic imports?
the geometry cache is about as fast as the shader. Not sure if thats any help, but it may be easier for “fixed” animation or whole groups.

As far as the pipeline you mention, it could work but you would not be able to merge actors and reduce the draw call count of the crowd (without major work to get the shaders to play nice).

Also, the way you describe it, each action change would be an additional drawcall? Granted they happen over time so it probably doesn’t affect the whole, but on a crowd of a few thousands you would still get performance issues stemming from the amount of overall draw calls…

Thanks for the answer

Alembic is a fall-back solution, but we would like to provide something more efficient fps-wise to our customers. Alembic cache may reduce the draw calls, but saturates the CPU-GPU link as it sends all vertices each frame.

In the planned pipeline, the relation is one (contentBrowser) asset <-> one actor in the scene, potentially with several InstancedMeshComponent. Instances are managed at the InstancedMeshComponent level and should not add draw calls.

By using “instanced” mesh components, compared to standard skeletalMesh, the drawcalls should drop : giving a scene using for example 10 “model” characters of 15 meshes instanced to make 10000 characters, that would be 150 drawcalls (* pass count) instead of potentially 150000. That should also save CPU-GPU bandwidth as we send the meshes only once per base asset (and probably keep them through frames in GPU memory), and only one transform for each instances meshes (here 150 000 transforms which is quite lite, around 10MB in raw 4x4 floats).

Did I get something wrong ? Specially about the HISM management of drawcalls ?

Sounds about right.
the issiue I have with HISM is that frustrum culling doesn’t always work as you would expect…
You should probably conduct a test with just a static mesh and a random WPS change. Make them spin or something. To see what your actual costs come down to.

How’d this work out? I’m looking into something similar. Seems like there isn’t a way to attach an instance of a HISM to a character as expected you need to tick a transform to match the dummy character root. You could possibly multithread this, but then it may be locked up too often.