Nanite Performance is Not Better than Overdraw Focused LODs [TEST RESULTS]. Epic's Documentation is Dangering Optimization.

I would like to remind folks that poly-count (tris) aren’t such a bottleneck for nanite. MORE triangles in a mesh, vs say 10% density, can certainly seem like a savings, but… Since nanite clusters triangles and culls at the cluster-level, reducing the triangle-count can make clusters less-granular; the triangles will ultimately be larger since there are less of them and thus the cluster cannot have as fine curvature as otherwise it might.

This makes culling less workable as you cannot use a given cluster to occlude smaller and smaller details. With too-few triangles the shapes can be blocky and be less efficient at occluding. Use a full-level model and let the nanite-system leverage the increased triangle-count for culling.

Also, the number of tri’s in a model isn’t always what nanite will draw on-screen. Given things are almost certainly not right in front of you, Nanite is creatively-decimating the mesh to reduce poly-count and generally arrive at 1-tri-per-pixel.

3 Likes

maybe with some 32GB GPU and Pcie 6.0 SSD that would be a case. For now i dont see how to manage that type of assets runtime in the the engine with levels 4k size+.

1 Like

Perhaps an alternative approach may help.

In rdInst I’ve added a streaming/spawning actor that works in a similar way to WP Tiles (except can be reused and anywhere in the level and doesn’t require WP (and works in 4.27)). It doesn’t touch the landscape - but from what I’ve been reading switching the landscape to Nanite is more optimal than building landscape tiles in HLOD anyway (I use built meshes for my landscape which just avoids the issue and appears to still be considerably faster).

These SpawnActors can be loaded/unloaded at anytime, and can be built from existing actors/instances in the level. This way it’s easy just to load the ones you’re wanting to edit/view in the editor while keeping your memory low.

At runtime they do fast streaming/spawning - can split the spawns over multiple frames - have “Stream/Spawn” distances assigned for each object (e.g. have larger objects spawn in further) and have a data system similar to ECS which makes scanning/processing them very quick (Async).

They’re also stored in tight, reusable DataAssets making their memory usage and loading tiny.

My game isn’t as large as yours (12km x 12km) but I designed to specifically for optimal large open world games - from my tests so far (and I still have a lot of tests to do yet) it’s proving to be faster and smoother than even the latest incantations of WP streaming.

1 Like

At a ridiculous-scale, yes, but even models with 100k tris are doable on old machines. I find 30-40k to be overkill on most rock, tree, etc things. If you are using instances or even better, assemblies, then you can leverage nanite for that lack of instancing-overhead (read the paper a few links above) which does much to claw-back overhead.

Anything 4k should be set to streaming/virtual-textures (duh). For my part, extensive testing only confirms that w/nanite, VTs are net-gain overall.

Do we even need a 4k texture? I like to try and reuse 1k/2k textures as I can tile them and mix them up, hence ‘increasing’ resolution (texels vs pixels). Obviously a net-gain with smaller textures being moved around.

2 Likes

Yeah I’m really looking forward to Nanite Assemblies and the Nanite Foliage Voxels, the potential there is wild.

1 Like

When creating HLODs with the foliage - you’ll probably get a better result if you use Planar LODs rather than Billboards - they’ll work better as they’re static and you can add a top plane for better views from above. It also helps create more depth and should handle lighting better.

1 Like

Depends - as a model you sell? Yes. You should have 8K textures.
As a model you then include in a game? No. You need to be smarter than a toddler and include some reasonably sized texture.

Since arguably this engine cannot render native 2K without burning the avarage user computer to a crisp, chances are that 1K textures are already “too big” for your project.

On the other hand, when you use a better engine thats capable of accurately rendering things in a nicer way - and if 4k rendering is ever viable - then your 1k texture can actually come to look “soft” or just blurry.

VTs can be a gain - they can also be a problem.

Particuralrly since Epic never really bothered fininishing the RTV pathway and including foliage/grass instances into it afaik, i find them to just be a hinderance.

I do still have a project that is 144km^2 running on this engine occasionally - for the most part it’s been moved to CryEngine where rendering 4k and billions of forest instances is about as easy as it is for Unreal to crash while you work.

Wouldnt even dare to try and upgrade to nanite because I know already that Epic is a waste of time - but to each its own…

My benchmark on the subject are also detailed in separate forum posts - but it boils down to “this engine is not viable for large projects requiring any kind of accuracy”.

So far, nanite seems to be the “end” for epic rather than the saviour they are trying to make it into.

okay ive managed increaed size of the tree then reduced ammount of instances. It loads now quickly but should that way reduce frames a bit :wink: Build took me arround 10 minutes without Raytracing. 50k trees but area is Filled propably the same density :wink:



need propably do something with seams as i mentioned before

And Video below how project runs with TSR applied (there is no RT and DLSS)

3440x1440 - Ultra Wide, OBS 300MB Memory allocation
RTX4060 Scenario 35% Power Limit for my gpu

and Later Full Potential for RTX4070 Ti Super.

and Streaming Benchmark forgot to show^

So i think i exhausted the topic and show where it ends when you are going to make more triangles that is needed for your openworld game.

Conclusions and Pipeline for Current Gen (RTX30-50):.

  1. For Tree foliage Assets use that have best quality and lowest triangle count, one or two material slots

  2. Use Nanite everywhere you can and turn it on in Merged HLOD with preserve area
    a)try to find or make full geometry with opaque material (cutout).
    b)nanite removes popping from assets
    c)most visible at distance when preserve area
    d)removes problem that Shadow Maps cause - flicker shadows with Direct Ligthing
    e)Use Min Lod1 with Blilboard,Impostor to reduce triangle ammount if you need to see assets on Merged Hlod or in Runtime distance.

  3. Try to avoid using Landscape it have milions of triangles and you can make max 8k size map cause sculpt tools wont work with larger one.
    a)Its demanding
    b)Use reduced triangle Mesh Tiles instead - disable them from World Partition and Hlods , level will load little longer but quality and memory usage will be overall much better for example 16K size. Higher Level Size may require switching to WP and HLOD.

  4. Implement Upscalers:
    a)They Remove most of screen issues
    b)They Reduce Memory usage
    c)They Increment Screenspace visually i mean if you have 33% set - it looks like 50% so game will look the same but will perform much faster so players can increase detail as example or turn on Raytracing.

  5. To improve workflow extract lowpoly assets (Billboard LOD1) and use them in Foliage tools then bake and later switch for proper assets that will be managed in runtime partition.

Why is that?
More Detailed assets will end with Hlod building time hell
More Things In Runtime Going to be Loading to map Menu Experience and frustrate with testing or playing
More Detailed Assets and More Things in Runtime will end with Menu Experience and 12+ Gb Alloaction.

1 Like

i have noticed you are using Megalights you shouldn’t it’s experimental and doesn’t even work with light coming from the sun.

1 Like

Ive add to the option switch in the test project just for science. Iam minimalist and if that its broken It will be not implemented later ;). Also had increase triangles for Tiles to not have seams - which produce 1hour HLOD build instead 10 minutes.

Those distance trees are looking very nice!

1 Like

thank you fixed seams a bit but HLOD build time for that hurts x6 times :smiley: from 60 triangles to 300 per tile


and that problem left looking what it cause i think its overwriting Merged with instanced Hlod but need to separate tiles and try different setups. It will take some time

1 Like

and next discovery for Tiles, Level load 10 seconds instead 5. And Disabled from World Partition and HLODs. Less ram Usage arround 300mb and Better visuals , fixed issues - less packaged data on the drive also.


infinite quality

1 Like

You know the landscape gets baked to a partioned mesh at cook time right? That’s why you can’t deform it through code and why epic is working on a next gen terrain system.

Also prove to me how 4k has not been hard to run? You claimed that I am dumb for saying that, explain to me why?

Are you Kevin’s alt account, you have the same language and mannerisms as him.

1 Like

Show us your millions/billions of trees at 4k in cry engine, given cryengine is known for being hard to run id love to see you show us how wrong we are, even crytek can’t do that.

1 Like

Updates and improvements:
200k Pine trees 30k triangles each (distant4)

and Ray Tracing with DLSS 4.0

1 Like

Warhorse Studios have been created 14 years ago and they only made the 2 KCD games.
Arrowhead Game Studios, their 2 last games are Helldivers 1 and 2, Helldivers 1 is from 2015.

Kingdom Come Deliverance 2 and Helldivers 2 don’t have good performances because they use a “better optimised engine than Unreal”, it’s because these 2 games were developed and optimised for near a decade.

CryEngine last update was May 19, 2022.
Stingray has been discontinued in 2018.

Furthermore, KCD’s graphics are not that impressive, it is the artistic direction that carry the game’s graphics. Visually it’s equivalent to the medium settings of top AAA games, not more.
But both games make perfect use of older technologies.

To finish, Valorant has been updated to UE 5.3 (the game was on UE 4) and guess what, it’s running slightly faster at 900+ fps at 1080p, not slower. Moreover the game now takes 24GB instead of 58GB.

And don’t get me started on the tools and the documentation to build a game in CryEngine it would require to literally multiply the time necessary to make the same game than in UE with no better performances.

5 Likes

also added Skylight,Exponetial Fog(Volumetric),Postprocess (Maxed Lumen Setup),Volumetric Clouds(base setup making it more just ruins frames more :D) , its breathtaking but also frametaking (40 Frames gone from FrameGen Scenario on a 4070Ti Super :smiley: )

They changed Volumetric Clouds in 5.4 that are more expensive.

And optimized Postprocess

Testing Nanite Foliage (Voxel) - Will let you know soon where we at :wink:
Nanite Settings are duplicated in a Mesh propably for a backup?:

Nanite Foliage On: (switched from Preserve Area to Voxelization)
As you can see it changes colors for asset in distance

Numrays:

that thing change propably triangles(needles density with the distance) and changes raytracing fallback setup

Voxels / RayBackup :
that things care propably about density too


much denser

RayBackup :
Propably help with tweaking both and somehow change shadow casting in distances(close,far)




Separable,
dense voxels in the mesh - make it less visible and increase disk size.



Voxel NDF
Optimize Disk size
Voxel Opacity
Optimize streaming size

Most optimized but worst looking setup - (lowest in streaming memory)

Best Looking “most Dense”

  • removing Voxel Level to 0 provide to reduce distant view cause of setting “Num Rays”.

Balance Setup?

6% triangles,1% fallback,5 bits,1cm - close and far distance
(balanced voxel- propably cant be used in that setup

  • its much more demanding in streaming,
    Voxel Level = 1 applies that and increse Nanite Verticies)


setup that looks similiar but still preserve area take less memory it’s there even Voxels on the left?

distance view with same setup

and Voxel one with fixed close range 25% Triangles (minimal acceptable setup for 5.7 Nanite Foliage without Assemblies Feature)
Takes less memory in streaming and its more visible in distance - that is cool;)
Hlod will take longer to build it can take 8h or more for this setup. (based on calculation 2hours per 6% of this mesh so x4 more triangles)


and distant view:


voxelization and overdraw is going to change life They Fixed

Preserve (Old)

5.7 masked billboard


5.6 masked BB distance

max frames for Nanite Old (25% Triangles - Preserve Area)

max frames for Nanite Foliage (25% Trangles - Voxels)

HLOD

  1. had to 1% Fallback for tree
  2. 0% in HLOD Merged Voxelized(Triangles,Fallback)


close image(if more percent in HLOD in close distance more overdraw -but who will use that in close distance ? :sweat_smile:)

Far distance

Still Wouldnt Recommend building HLOD based On Nanite Mesh due to time it will take and increased resources it will use.
200k trees with BB ive packed in preview 5.7- 20 mins (but could have that uncommon “LAG”)


Hlod Voxel cause some error in that spots.

packaging project took 78 Gb of RAM allocation in peak ( + 128GB Paging support)


So if you think creating such a big worlds buy more memory :slight_smile:

1 Like

Comparision Video
Voxels Vs Preserve Area in default World Partition Setup (Packed Billboards, 25% Triangles Voxel (5.7) 6% Triangles Preserve (5.6) )

also Engine or GPU couldnt Handle OVerhead and this its only 1/5 terrain filled with assets… Wish its only 5.7 unstable issues.

1 Like