Environment art pipeline for large scale 3d levels - many static meshes composed of instances better than fewer but larger static meshes?

Suppose we have several large buildings that are constructed from a modular kit.

Walls, columns, trims, etc:

I can import a building into Unreal such that maya’s instances are brought over. Then we have a building that can be saved as a blueprint, and it is almost entirely composed of instanced static meshes. Of course those instances have to be associated with static meshes in the project.

This is a pretty complicated workflow though compared to something else: In maya I can just export an entire building as a single static mesh. Unreal then has no idea that buildingA and buildingB are composed of the same smaller static meshes. Unreal only knows about shared materials.

But it is reasonable to expect that if one part of a building is loaded, probably the whole thing is? So then is a building that might be measured in megabytes being loaded much worse that buildings that are composed of many instances?

It’s all testable, just will take a bit of setup, so curious in the meantime what others experience are

1 Like

Yes keep it in smaller instanced meshes - I noticed in the one of the latest Epic videos that authors of the hillside project said that for time reasons they imported some larger meshes (eg whole wall and floor) and it didn’t seem to slow their framerate down at all so it seems you can get away with some larger meshes as well…

1 Like

This video?

1 Like

Yes that’s the one :slight_smile:

1 Like

Thanks, I have searched through the transcript but couldn’t find that specific point… still a ton of nuggets in there and I’ll download project as well to check out further.

I think it is probably wortwhile to test simply because the instances workflow is considerably slower and more complicated, and I may never hit performance issue with what I am doing…

so I think for a test, I should two levels:

level 1 = several blueprint actors of buildings that are composed of ISMs - duplicate these to fill the entire level.

level 2 = each building is its own unique static mesh, also duplicated to fill entire level.

For this test we should probably not involve anything with materials, right? We only want to measure the performance of loading a few large static meshes compared to a few small static meshes + instances?

But perhaps we ought to assign many materials, as the big benefit of instances is draw call batching?

1 Like

It must have been in the live stream one then.

That’s a good idea, that would make an excellent test - maybe with a good selection of game-like materials would be the best so it’s pushing the VRAM like you say. Maybe both with and without for comparison?

Okay so i did a test like this:

Level 1 has several hundred static mesh actors (nanite enabled). Each static mesh actor is composed of a handful of high poly columns. The total amount of columns is ~3,500.

Stats read (i will round to nearest since there is small fluctuations each frame):
120 fps
total frame time 8.3 ms
GPU: 4.3 ms
Draws 240
Prims 320k

Level 2 has same number of columns, same materials, but there are several hundred blueprint actors, each containing 12 ISM of that same column.


The stat readout is pretty much identical.

I am not sure if there are any other relevant stats to look at. I also wonder if there is anything important missing since this only has one static mesh. Because if we have a good amount of unique static meshes, thats where a difference would most likely be seen, as compared to if all the buildings are just composed from a handful of static meshes that only needs to load once.

But the instancing workflow is so much slower and more complicated, I really hope to avoid it if possible.

As @RecourseDesign suggested I ought to test with a lot more materials at play as well. But further test will take more time to setup.

1 Like

I think if you then take all those BP actors and “assimilate” them into one BP actor - then you’ll see the true speed increases - I added an “assimilation” system to my rdInst and rdBPtools tools for exactly that reason (and to reduce the pain in creating the “Prefabs”).

In 5.2, there’s also the “auto instancing” - I haven’t managed to get it to work - but in theory it takes all simple “StaticMesh Actors” in your level and creates instances for identical ones - if that works that’s going to be a great way to do this kind of thing - especially if it behaves well with world partition.

Perhaps a mesh that’s a bit wider so it obstructs more objects behind it - just thinking of my game, I try and have a lot of things like hills, buildings, billboard signs, small forests etc around to help obstruct objects - and that in itself is one of the better optimizations…

1 Like

does that assimilation process work in tandem with occlusion culling?

And this is something different from the Merge Actor tools here, right?:

If I use that Batch tool, for some reason it nukes all the instances and leaves just one per actor (or maybe it just resets the transforms so they are stacked on top of eachother, hard to tell)

It’s a good point about obscuring - this is really a stress test. It is probably more geometry than my scenes would typically have, and definitely longer sight lines.

I thought I’d better go through and test the occlusion culling again - I hadn’t tested it in 5.2 plus I’ve noticed a few people on the forum have been having issues with ISMs and HISMs with culling in general (I’ll test distance culling next)

I try and avoid the Merge Actor tools, I wrote my own that goes through each StaticMeshActor and converts to an instance, then on top of that a routine that looks at all child meshes and adds them to their appropriate HISM components…

1 Like

AFAIK Auto-Instancing has been around for drawcall optimization since 4.20-ish days. But this is mostly a rendering optimization. A factor that needs to be considered is the streaming actor / component initialization overhead, which can be considerable at scale.

1 actor with 1 ISM component that has 500 instances is cheaper than 500 actors. Instancing still has its place and value.

Matrix City Sample is all ISMs for this reason.

2 Likes

@Reclaim7198

Are the entire buildings each an ISM, or are the composed of modular pieces (like walls, doors, etc)

I can download of course but it takes hours. Still waiting on the hillside, lol.

But a more complete test would be tons of unique static mesh actors, compared to same thing but composed of modular kit that is instanced?

And runtime performance isn’t as important to measure as it loading times?

In Matrix City Sample each building is made of 2-3 actors, each with dozens of ISM components. Each of those components can have hundreds of instances - though usually only dozens.

Its difficult to compare perf on this since its streaming, and streaming has a ton more factors that affect its performance. Its a bit of an apples to oranges comparison to try to compare instance streaming vs tons of raw actors. Streaming has feedback loops since its using a frame budget approach to stream data.


Need to consider it for your use case, if you are doing ArchViz you probobly arent doing a lot of streaming so higher actor/comp. init cost isn’t an issue. In an open world game though, you are practically always streaming content, so it is important to keep those costs in mind.

There is a misconception of ‘load times’ only affect when transitioning between levels. Thats not the case. Modern games are always streaming / loading data, so important to keep an eye on streaming perf.

1 Like

Gotcha, thanks!

In my case it is maps usually less than 1km and playable area is not making full use of that. Much is simply background. And I am not streaming any levels, or at least I don’t think it would be necessary. More like an old school, linear shooting game where you go level to level with load screen in between.

Will continue to test further though.

Shame HISM’s don’t have per instance occlusion culling.
The FreezeRendering command pretty much debunked that theory.

Yeah I tried the FreezeRendering command when I saw that, and sure enough the instances are rendered there, but that doesn’t explain this:

Is it a frozen world partition segment that is unloaded?

That level was world partition, but nothing was spatially loaded. To remove that from the equation though, here is a test in a non-world partition level:

I tested the distance culling as well in 5.2 and that appears to be working correctly too. Out of interest I FrozeRendering and noted that the culling updated as I moved which in my mind convinces me that instance culling is still being updated during the “FreezeRendering” but is in fact working (bug in FreezeRendering).

Here’s a little more data, comparing file sizes of a building:

Entire building is a single static mesh = 5.71 MB (not sure if this includes referenced things, like materials and textures)

same building but using instances as much as possible = 384 KB

building using instances, but also all dependencies and references = 3.39 MB

So I think that it is pretty clear that instances has much lower memory footprint, which makes sense. Rather than knowing where 600k unique vertices are, just have to know like 100 transforms for same static meshes being instanced.

I think the biggest benefit would be compounded when these same static meshes might be instanced across many different building types.

1 Like