Can I use nanite to render many meshes within a single actor?

I’m trying to create a space engineers-style game and have created a pretty good system for procedurally generating the meshes of block-based structures. Currently, my method consists of using a 3D array representing individual blocks in the ship which is then used to generate “chunks” which make up the entire ship. The 3D array is not a fixed size and it could easily reach sizes close to 256^3 (about 17 million blocks). Right now, it runs pretty well on my device, but, obviously, this won’t be sustainable once I begin to introduce blocks with more complex geometry.

I am now looking to add some sort of LOD system to my ships and want to figure out if I can use nanite to do so for individual blocks, if I will be forced to create my own LOD system, or if there is some other method that I’m just not aware of.

My hope would be to continue generating colliders as chunks and then somehow have every individual block rendered graphically so I can still use nanite. Is this possible or am I SOL and I just have to come up with a custom LOD system?

I have a similar case. I need to visualize thousands of static mesh components in a single actor.

I found out that using too much components in single actor is very expensive. Try to use multiple actors with tags or try to use instance static meshes.

You seem to be mostly right, my approach of using components has been unsustainable. But I looked into instances, and I found something called a Hierarchical Instanced Static Mesh component and it’s specifically meant for a bunch of instances within a single actor. It may be the answer to my prayers, but I’ll need to do some research on it before I make any definite claims.

I won’t go to details but nanite is for rendering highly detailed/high poly assets. But if we speak about thousands of components you will have two problems

1- draw call
2- game thread issue when you put all components to same actor because when you try to access it (especially with blueprints) engine try to read everything.

if you have same mesh and materials but different transformations, instances will help you but not nanite.

btw. look fake shadows and disable cast shadow if you don’t need realtime shadow.

I won’t say specifically use hierarchical instances or standard instances. look for them and use one them based on your needs.

1 Like

Basically you need to implement chunking.

Only the active block you are in is made of individual drawcalls, stuff out of that immidiate range is x2 - and progressively bigger power of 2s the further you get probably.

When you build the meshes you know where the tris are supposed to be if you use cubes (minecraft style?) So the rendering cost is miniscule as all you need to store is a boolean value of “block existence”.
You then have something like a 9x9x9 set of blocks, and all you need to make the mesh build for the chunk is the 0 or 1 value of each block…

Still, you can have billion instances before stuff geinds to a halt using HISMC…

One last thing.
@eray_ozr is wrong.
The instances are directly accessible by the engine in blueprint regardless of what they are, how many, etc. Without any issue or slowdown.
The engine will geind if you populate an HISMC and then try to edit the instances in engine. That is because the editor wants to read all the instance information at once.

Otherwise, all it takes is a line trace to get the index.
DynoFoliage: UE4 Interactive Foliage works for foliage approximately that way (but accessing foliage is a bit more complex).

1 Like

@MostHost_LA you are wrong.
Please read carefully. I didn’t say instances will slow you down. I say too much static mesh components in same actor will slow you down and you should use instances or multiple actors to eliminate performance problems. (look at my first comment)

For example, try to import a big FBX assembly with Import Into Level. Your performance will be worse than instances.

You just said same thing. Both of us suggested to use instance methods :smiley:

A purely cube-based system would certainly be easiest but, unfortunately, I plan on having non-cubic blocks such as stairs and wedges and weapons and only a couple of blocks will actually be perfect cubes. Do you know if there’s any difference between HISMCs and regular ISMCs other than adding LODs? So far, there seems to be no performance difference between HISMC and regular ISMC. I can get some excellent performance with instancing, but it really suffers once the structure moves at all. Do you have any recommendations for speeding up the transforms of all the instances? It’s definitely faster than individual components but it’s still too slow for what I need.

Instancing significantly improved my performance, but, since I’d like to have many different blocks with unique shapes and sizes, it seems like it may end up not being a great solution for me. As far as I’m aware, each instance only supports one kind of mesh, but I may have hundreds of different meshes which would mean hundreds of components for each mesh. Am I correct in my understanding of this?
If so, then my main bottleneck seems to be the fact that every instance and component will have its own transform which means I lose too much performance when the grid moves. Is there any way around this or will I just have to go back to my original plan chunking blocks together in order to reduce the total number of transforms?

You need to mesh merge.

This is probably no different than a modular character, just at a much larger / possibly unsustainable scale.

You need to review the eu4 docs on it, and implement the thing via C++.

Im assuming you have some sort of creation screen where the assembly happens instead of a base building system.

Once merged, pieces arent individualized for moving/changing, so you need to keep versions of the model separate.
One that is editable and contains the components.
One that gets merged and used in game (without any of that extra data).

As a basic in engine example, you can throw a billion pieces into PIE, get 2fps and assemble the object.
If you then select all of them and merge the actor you get back to max fps / 1 drawcall or a lot less, one per material.

The difference is you want this to be something that can happen at runtime - and the system will end up hanging thr game thread when processing, so you really need to read up how to make non game-blocking threads, move the computation bit there, and present the user with a loading screen or even allow extra gameplay while conputing happens…

Dang, I was hoping to avoid multithreading the models. I tried multithreading the mesh generation a while back, but the main issue was that all the models were on the main thread, and I had to duplicate the mesh data for all the blocks I needed onto the new thread. It was the classic memory vs performance dilemma. It’s great for collision generation because detailed collision only has to be close to the player and I can sort of fudge it with lower poly abstraction at greater distances. Either way, thanks for your help, you answered my original question, it looks like it’s time for me to go back to the drawing board!

1 Like

Technically, the objects exist as sheer numbers.

What do I mean?
Well, all you need is the list of Mesh Name + Tranform.

If your model compiles that list out and passes it onto the other thread, then the non blocking thread can re-build the merged mesh one object at a time.

You will run into the same age all optimization dilemma…
So you talk to the game other thread once every component? Every 3? Wait for the whole thing?
(Every component defeata the purpuse so just a silly example really).

The other issue is the data from the transform and how that is stored.
Its a single object, but it contains location, rotation, and scale as V3s.
Say you use regular floats.
Thats 9 floats per component.
9 floats times a billion - and it takes longer to pass data to the other thread than writing the math out…

This situation is not a known object of the same general coordinates and shape, so simplifying the transform is probably not all that possible.
Maybe you can get Scale to be a single float if it is constrained, but loc and rot are still going to have to bead float3s…

I would suggest figuring out a batch number that works based on power of 2 values.

And making the system smart enough to time itself in order to x2 the batch size if the computing power of the system in use allows for it.
Faster CPU, better build times…

What you could do, is to maintain the build as a text file stored locally (at a minimum base64 encoded to prevent people editing).

Then you can just memory stream the document into the non-blocking thread instead of having to deal with batching and sorting ducks in a row…

Really, not sure that helps though, as like you said the meshes are in the game thread and have to be written out from there…
My 6th sense tells me it would probably end up being about the same as merging actors off the main thread direclty.

To be clear, the problem isnt really the merge or the write, its the loop of a billion possible instances that will cause a hang.

So then, back to batching, but off the main thread and ditectly into the mesh merge…

Unless you find an easy way to maintain both…
Like say, instead of writing to file all at once, you write to file with each piece the player places.
In the case the non game thread will then eventually just have all the data it needs and work… I think that may be the best approach here…

Ps, to be clear, write data to the file direclty. Its faster instead of converting to readable text, then encoding.
So instead of text, go with a .bin or similar format…

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.