Nanite - Trying to utilize the '1 draw call per material' for optimization

I’m experimenting with using baked Vertex Color on Nanite meshes, instead of textures. The main issue is the vertex density. Even at very high vert counts, Nanite collapses the triangles with distance too quickly, resulting in a blurry appearance:

I’ve discovered that when I flatten the vert normals (“Shade Flat” in Blender), it helps greatly, as you can see in the included comparison.
The mesh seems to have a higher memory footprint then, but I’ve tested it in a scene with around 300 of them, covering most of the screen and I see a very minimal difference in ms (compared to the non-flat-normals version).

Is there a way to achieve this denser vert distribution in Nanite without flattening the normals? Is there any danger in using meshes with flattened normals like that, apart from the larger compressed file size?

Update: I’ve tried the “Position Precision” setting, but even the lowest value doesn’t change anything in this case.

1 Like

Take a look at the section about “Hard and Faceted Normals”
The technique you’re using essentially neuters nanites ability to any meaningful mesh simplification. It will dramatically increases the vertex count and as a result, the vertex shader cost.
You would probably be far better off sampling a texture for anything too detailed to rely on the baked vertex data.

2 Likes

Thank you, I see now. My idea was to use one material for every Nanite mesh to drastically reduce draw calls (right now every unique mesh needs its own material instance, for the Color texture), but maybe it’s not worth it.

Sure thing. As it seems you already know, Nanite can draw multiple meshes with the same material in a single call. This makes it ripe for cool material optimizations. I would probably try a texture atlas or array to avoid needing to switch textures via material instances.

Hmm, maybe a texture array masked by custom primitive data would be a good idea for Nanite meshes :thinking:

2 Likes

Exactly. As long as you are clever as to which textures share an atlas or array as to minimize wasting memory loading unneeded textures.
For that reason, I suspect an altlas using streaming virtual textures may be more efficient than an array, especially if you want to include lots of textures that may not be on screen at the same time.

2 Likes

As a summary, the best setup (performance-wise) for Nanite meshes for now:

  • Group the meshes into e.g. ‘biome’ groups. For example, ~20 meshes that will likely be used together in a forest-biome scene. Create a texture atlas from all their textures, or use the UDIM workflow to let Unreal create the atlas for you.
  • Use one master material per biome, with a Virtual Texture sampler inside. The UV offset (to pick the right texture) can be set via custom primitive data of each mesh, or by shifting the UVs manually in each mesh.

If anyone has better ideas, please share :slight_smile:

2 Likes

It would be ideal to have an automated way to optimize this. For example, a script that:

  • Gets all the textures used in a scene
  • Generates one big atlas from them, as a Virtual Texture
  • Assigns one material to all the meshes using the textures and assigns the correct atlas area via custom primitive data or shifts the UVs on the meshes.

I wonder if the Virtual Textures already help with that under the hood, in some way…?

Anecdotal experience. I had set up a stochastic-triplanar landscape-material with temporal-sampling. It was using texture-arrays to reduce instruction-count and sampler-count. Just as a point of reference, 1440p getting ~60-70fps with the stock temporal-denoiser; no upsampling. For reference, the open-world samples from epic get similar performance with stock materials on the same machine, but without multiple painted layers, wetness, puddles, raindrops, and all the other stuff you cram in there, etc, etc one expects in landscape… So at least comparable or better feature-wise. I’d expect better overall but c’est-la-vie…

ANYWAY, not that is good nor bad, but when I migrated the texture-arrays over to UDIMs, I got an immediate 25ish-35ish jump in FPS with somewhat less memory-used. This was in comparison to using 2k/1k textures in the array vs 4k virtual-textures. 4k in the array was ~30fps less overall; yuk. Thus the UDIM is the way to go it seems, esp for larger textures.

Even with ~15+% increase in instruction count when using UDIMs and adding a few virtual-texture stacks, performance is noticeably-better with the UDIMs vs texture-arrays, with the added option of being able to use distinct resolutions per-texture.

Only drawback was that using SRGB I had to divvy stuff up. BaseColor/Albedo, at least, has to go in a distinct container since you cannot enable SRGB in a distinct region of the texture-array, and converting in the shader costs you, so it seems worth just-breaking-that-out for it’s own sake, and the same of for any other channels, packed textures you set up. Using SRGB BaseColor, Normal(2chan)+AO, and Gloss/Rough/Displacement + (3) related-samplers, I still ended up a net-better for performance vs one mega-texture/sampler and doing extra UV math. Making 3 distinct sampler-paths was worth the effort in reduction of virtual-texture-stacks as well as overall instruction-count.

For at least just a landscape, with this as a guide, still getting ~95-100 FPS with stock settings at 1440p, and at 75% screen-resolution, ~80-85 @ 4k.

1 Like