How to move a large amount of meshes at runtime without chocking the game thread?

Karton · May 15, 2023, 9:12am

Hi,

Context:
In my physics-based game, I’d like a large modular spaceship to simulate physics.

I’ve made a lot of different actor blueprints, each with quite a few static meshes.
A single rigid body has “simulate physics” ticked, and every actor of my modular spaceship is welded (aka parented) to that body.

Problem:
Obviously, the more actors in my spaceship, the more computationally intensive it will be to move it around. My problem is that it’s too heavy to compute for too few actors.

Troubleshooting so far:
I’ve done a stress test, and Chaos Rigid body simulation can perform on my computer at 60FPS with up to about 5000 convex hulls. This is enough for my game.

However, I have many times more static meshes than collision primitives in my scene. (either static meshes with collision set to “No collision”, or static meshes that have no collision shape set up) (stat unitgraph mentions up to 400K primitives. I’m not exactly sure what that number means though.)
So even though I only have 1000 actors (with about a single hull each, and a dozen static mesh components), my game thread takes 30-40ms.

I even disabled physics, and tested with a rotating movement component. Same 30-40ms on the game thread. If I set the rotation speed to 0, the timing drops below 10ms.

The above points me to believe that for my issue, Chaos is innocent.

Current status:
The CPU is just taking too long to update the transforms of each mesh when my spaceship gets moved around.
I’d like to investigate a way to alleviate the load on the game thread without compromising scene complexity.
Ideally, we’d find a solution that is mostly independent of scene complexity. I’ve got the intuition that this may exist since the entire spaceship -mostly- moves as a rigid object. (If the nanite preprocessor was fast enough, I could literally merge all actors at runtime into a temporary single mesh)

Potential workaround so far:

Reduce the entire load somehow:

Merge as many static meshes as possible in the actor blueprints. Each merge is one less transform to compute for the CPU. I have the intuition that this may not rise my budget enough though.

Use the GPU (Most meshes transform aren’t needed on the CPU)

World Position Offset: I could leverage World Position Offset. I just don’t really know how.
Niagara GPU system with mesh renderer: One would hope that when a GPU Niagara system is moved, the transform update of each mesh particle isn’t computed on the CPU. but it seems that this mesh renderer needs complexity management like a traditional raster, so I can’t just give it all my meshes, because nanite won’t work its magic.

Any help would be appreciated!

Everynone · May 15, 2023, 12:11pm

Do you need to use actors for that? Perhaps an extended Static Mesh Component could be enough. The only (obvious) drawback is that a component cannot have a timeline. But a lot can be achieved with a clever use of timers.

Here’s a base component and its unique children:

You could then build a modular starship actor with those components with only the Core simulating physics:

Each component has a graph, variables, functions and so on. Easier to build in a modular fashion, much easier to communicate with, easier to add & remove dynamically, and less fussy than a CAC in general.

Most importantly, it should be much lighter on the resources than a Child Actor Component I am assuming you’re using for this atm. Right?

Do give it a test and see if it makes a difference in the scope you’re working with.

Now, if, let’s say, a Sensor Boom would add 15 identical meshes to the ship, you could use a similar technique with an Instanced Static Mesh component. So a single component can show a bunch of meshes Or even gor further and reach out to HISM if you’re brave enough.

How to optimise it depends on what you require the extra modules to do. The above may not be suitable in certain scenarios.

Karton · May 15, 2023, 5:04pm

Hi Everynone, and thanks for your detailed suggestion!

Couple things on that one:

You’re assuming wrong . I am using parenting between actors. As in, I have that structure in my level outliner.
I can try with some extended Static Mesh Components… But why would they be lighter than an actor?

Actually, I just did a test… a big “For Loop” creating hundreds for static mesh components, and disabling collision. Seems not to work. Here’s a test with 10K moving meshes (each of the 10 cubes has physics enabled, and 1000 nanite static mesh components without collision attached). Notice in particular the game thread timings and the unitgraph indicating that the editor was happily at 60FPS until I hit Alt+S.

I’ve also had a brief look at the instancing idea… but the bottleneck really seems in my mind to be the transform computation on the CPU. I could be wrong, but I’ve always thought of instancing as a great tool to reduce draw calls. In my case, I’m not over-budget on draw calls, otherwise, I wouldn’t be at 60+ FPS when nothing moves.
(plus I think unreal has auto-instancing, and I only use nanite which is rendered in a single big call.)

So, unfortunately, despite the great detail in your answer, I don’t think yet that switching to components or instancing will really improve performance. I do appreciate your suggestion though, and feel free to tell me if I misunderstood it.

EDIT: I really thought about the Niagara idea… so I did a new test.

This is 120K meshes (!) falling towards the ground at 60+FPS. You can see the GPU time reaching the 16.7ms budget however.
Here it looks like the CPU knows nothing of the GPU mesh particles. It seems that since it’s all GPU computed, rasterization capacity is the only limitation since the CPU never has to update the mesh transforms went the parent gets moved. In that regard, it seems even better than instantiated static mesh components.

system · June 30, 2023, 4:31pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.