Nanite - CPU performance with large open world, Many materials, even when nothing is visible.

We are seeing in our large open world that Nanite spends considerable cpu time. In the order of 1.5->2ms all cores, preparing dispatches for every material instance loaded.

This happens regardless of what’s on screen; we can be looking into the sky, with the gpu doing very little, and no actors visible, yet we still pay this fixed cost on cpu.

[Image Removed]

This is NaniteBasePass, ShadeGBufferCS work. going parallel on all threads for what seems to be every material instance in the scene.

I can see why this happens, cpu doesn’t have access to visibility buffer, and a despatch may need sent to the gpu for each material that may be visible.

Nanite handles this fine on the gpu side, when I’m looking into the sky, GPUtime is a non issue. But we need these cpu cycles back for other task work.

I wondering if some very high level broadphase culling could help, and if nanite already does this, and we’ve done something wrong to break it?

In this capture I have disabled lumen and VSM, so the only ‘View’ should be the main player view.

RHI Submission time is also creeping up, and I wonder if this is ballooning with all the empty dispatches

[Image Removed]

We have a very large number of material instances,

These all stem from landscape. Every Landscape component has 1 or many? “LandscapeMaterialInstanceConstant_###” on it.

For our further out Landscape where use hlods (with a few different grids for different object categories), we have one MI for each of those.<br>

Q: how many MI’s does epic tend to see or expect in a typical shipping scene?

[Image Removed]

Hello there,

As I understand it, each Nanite shading bin is submitted to this function in the ShadingCommands.Commands array. Would you mind if I ask how many unique shading bins you have listed in nanitestats?

Best regards,

Chris

Hello, this thread caught my interest because it seems to describe the issue we’re also facing. We’re running version 5.4 and are bottlenecked on the RHI thread with around 4300 shading bins, 4200 of them are empty. This becomes especially problematic on lower-end specs when users select low scalability and low resolution.

I’m curious, are the detailed per-draw call scopes something you added locally, or are they

available out of the box in Unreal? I’ve tried some approaches using draw events, but nothing seems to output them to Insights by default.

[Image Removed] Nanitestats when looking into the sky, in a large open world.

On a r5-4600 6c:12t cpu, parried with a fast gpu, we spend nearly 10ms on all cores doing the RHI_Translate work above.

It appears that the work is coming from the shading bins.

What HLOD settings are used on the landscape? There appears to be a substantial number of unique materials in use.

Also, in general terms, how much of the scene is being passed into the fixed-function Nanite path v. the programmable path?

Best regards,

Chris

The more detailed draw events should be available with r.RHISetGPUCaptureOptions 1. This will set a few cvars that emit a lot more data.

Alternatively, r.EmitMeshDrawEvents 1 will provide a breakdown. In 5.4, you will also need to set r.Nanite.ShowMeshDrawEvents 1 to get Nanite draw events.

Insights will need additional channels to see them too. I believe they reside in the RDG channel.

I hope that helps.

Best regards,

Chris