Nanite Performance is Not Better than Overdraw Focused LODs [TEST RESULTS]. Epic's Documentation is Dangering Optimization.

Ciprian_Stanciu · August 24, 2024, 10:50pm

I did some nanite vs non-nanite testing of myself a few months back. What I was testing though was to mass “convert” nanite meshes into LOD meshes automatically by disabling nanite and making LOD0 have the same number of triangles as the fallback mesh (usually at 2% of the original mesh), then add LOD levels at a 50% ratio until an LOD is <=100 triangles. First test was the Old West learning ( https://www.unrealengine.com/marketplace/en-US/product/old-west-learning-project ). After about 24h spent in compilling the new meshes, the GPU frame time was nearly identical, about 30ms on my RX 6800 XT.
Second test was CitySample. In CitySample the GPU time was something like double, so around 60ms vs 30 with Nanite but what was really bad was that on CPU shadow culling took 130ms, because when using nanite I think all instances get culled on the GPU and use the indirect drawing APIs. Sure, LOD meshes could do that as well but epic never implemented indirect drawing for them. I think a proper nanite vs non nanite comparison absolutely needs a huge scene with ~1M mesh instances (like in CitySample) and that’s where nanite shines.
PS: Nanite has compute rasterization for small triangles but for big triangles it uses hardware rasterization. You can even change the threshold used in determining what path to use.