Using Lumen/Nanite/VSM/PLA with low polygon meshes...

I’m looking to see if anyone else is in this situation, and what they did:

I’m working on an overhead project (ARPG.) Most of my meshes are fairly low polygons, and never need to be super high. Many modular wall pieces are very low polygons, and many of them are used with level instances and packed level actors. I’ve found that level instances seem to be by far the easiest method to create/modify prefab buildings and structures.

However, I’m finding this vicious cycle on advice for performance…

What I am using (and require):

  • Lumen (required)
  • VSM (preferred)
  • Packed Level Actors (preferred for certain objects)
  • Low polygon meshes that never change their poly count, so that every triangle lines up exactly all the time (required)

Here’s where it gets confusing…

Some info that I’ve learned or found out myself:

  • “Virtual Shadow Maps work much faster with nanite meshes”
  • “Low polygon meshes are slower with nanite”
  • “Packed level actors require nanite in order to generate correct lumen surface caches”
  • “Lumen works better with smaller (or modular) objects”
  • “Pre-merging meshes (for buildings) into 1 mesh reduces draw calls, but works terrible with lumen accuracy and performance”

When you read all those facts, and my requirements… it feels like a dog chasing it’s own tail!

“Lower performance to increase performance to lower performance to increase the performance of the lowered performance that increases draw calls, so that the performance of the lowered performance object can perform faster.”

:woozy_face:

I can’t even imagine how many steps I’d need to do to profile all of this correctly.

So I’m wondering if anyone in a similar situation has settled on a clear choice?

TLDR: Is enabling nanite on low polygon meshes worth it, if it increases performance in other ways? (and increases lumen accuracy)

1 Like

i really don’t see the cycle. could you expand please?

“slower”. hmm. i guess it depends on how much slower…

“slower” is a comparative, so i assume you’re comparing it with regular LOD’ed mesh.

i don’t have the answer, but i’m interested in this post.

i think humbly that the problem you’re facing is “i’m on a grey area, and there are no clears guidelines for the obvious choice, because, in this position, there is no obvious choice. plus this area also is at the limit of a trade-off, so there are conflicting conditions.” in which case i think it comes down to, it’s up to you and what you value. and don’t be afraid to chose what you prefer. there will always be people who doesn’t agree, but it’s your game.

i think it’s fairly easy to test the scene with and without nanite and see if the overall performance increases or decreases.

then, if you think the visual quality is worth against your (hopefully measured) performance degradation. which, in my humble opinion is always going to be a subjective metric, since it’s what you value (your values and preferences are personal). then go ahead, you have my permission.

things in software are a trade-off. and things like performance and visual quality or fidelity (not the same), are a gamut/continuum.

i personally love packed level actors. so i think that already lowers the drawcals a lot. and maybe nanite instancing won’t help much.

on the other hand, baking multiple meshes into one. i personally don’t really like that optimization. i avoid it. it’s hard to maintain, and breaks other optimizations, like bounds, culling, etc.

though maybe…. maaaaaaaaybe, if you merge a few things, and make them nanite, you can reduce them to 1 draw call. but, is it worth it? depends on your game i think. i personally would not do it.

you have to keep in mind how hard is to maintain, and author the assets. that’s part of the cost too.

also the future. you could optimize heavily with packed level actors, what maybe in the future you hit a case where you can’t use them.

the issue with instancing, and other methods to reduce drawcals that i just mentioned.

good luck. and hopefully someone can add more info.

Lumen, Nanite, and VSM really do not make much sense in a top down game honestly.

You don’t get most of the streaming or culling benefits of Nanite because there’s minimal occlusion and everything is a relatively fixed distance from the camera. Similarly unless your camera is really far away you’re unlikely to have enough instances or draw calls for that to matter.

You also don’t get much benefit from VSM because you have a very limited view distance and again everything is a fixed distance from the camera so it will be effectively like using a single cascade of CSM. I guess you get per-page shadow depth caching so there is that.

And Lumen is going to spend a huge amount of performance and GPU memory maintaining surface cache/radiance cache and a raytracing representation of the scene that is mostly unneeded because nearly everything is just going to get picked up by screen traces anyway…..

Just my opinion :man_shrugging:

3 Likes

Regarding lumen/vsm, this statement doesn’t make any sense at all. If lumen looks and performs well, why would I want to switch to a completely different (and older) rendering system from what I have been using for 11 months now.

My project uses lumen, and it looks good, and it’s not changing. I’m looking for performance (and accuracy) tips for lumen, and possibly nanite (even tho I wanted to avoid nanite.)

Again. Same thing. I don’t understand this line of thought? VSM matches the art style that looks best in my project. I just couldn’t get CSMs to look better. Unless you have some specific numbers to tweak CSMs to look identical? I tried tweaking CSM parameters, lighting parameters, and other settings a lot, and nothing matched what I got from VSM.

It’s much more difficult than you think. There’s so many possible permutations of mixing hundreds of meshes with/without nanite enabled to find what works better, and what works worse. Someone out there may know specifics with certain combinations.

And again, like I said… packed level actors DON’T work correctly with lumen without nanite meshes. An Epic developer even stated this. They do NOT generate surface caches without nanite meshes (try it yourself.)

And lumen doesn’t work well with larger meshes because it loses a lot of accuracy. And if you increase the cards to 32 for large meshes, it’s supposedly very poor on performance and sometimes barely generates enough cards anyway. It seemed like packed level actors were immune to this, although if I’m wrong about that, then I need to find another solution.

I’m just trying to poll any developers who have done similar things. If done correctly, I’m relatively certain the GPU won’t be a bottleneck for a top down game. However…. I don’t want to make it the bottleneck by doing something bad, like using nanite incorrectly.

My original point was, that I would have liked to avoid nanite completely. It doesn’t make sense in a top down game, because I don’t want any LOD. Yet, there’s things with lumen/etc that seem to just “work better” with nanite. Hence, the dog chasing it’s own tail.

Which is why I’m asking if anyone has experience with this.

Thanks

1 Like

well… nanite does clustered occlusion culling. this is good for shadow map or shadow tile rendering performance. it requires a good amount of vertices tho to fill the clusters. or it will render large low poly clusters and the benefit goes away.

i can see vsm working very well for a top down game. you can limit how many tiles are updated. only those with moving objects should be update instead of rendering everything in the whole old school cascade shadow map. depends on the light angles tho and how far the light propagates. that changes how many tiles need to be updated.

what i would balance is the amount of vertices you have in your lowpoly meshes so you gain the benefit of the cluster occlusion. in top down this is like a layer cake. bake layers. from top down.

when it come to lumen, i reckon disabling the surface cache is the way to go. you don’t get the “lightmapped gi” on your meshes. but you could use the irradiance probes to approximately light your scene. this seems logical in a top down rpg. atleast what i can imagine in this genre.

1 Like

The line of thought goes like this: You’re wasting performance on things you do not need.

It’s that simple. You get all of the costs of these techniques while losing most of the benefits.

There’s a reason why Path of Exile 2 does virtually everything in screenspace.

It’s your project however, you can do whatever you want.

Nanite is probably going to be the least of your problems, throw all your meshes into it and profile the performance for your use case. There’s no use blindly guessing what you’re supposed to do, you test and profile, that’s the correct way to optimize.

1 Like

Thanks! I’ll look into this.

Again, how do I get CSMs to look exactly like a VSM then? I couldn’t, but I would love to hear how this is achieved. VSM looked drastically better no matter what I did, based on my art style, which is closer to photo-realistic (not cartoon or stylized.)

Minor performance gains are irrelevant if it doesn’t visually look how I want it. I like how VSMs look.

Like I said in the OP, VSM is preferred. Lumen is required. I’m currently not having any performance problems at all with the GPU. However, I just don’t know nanite very well, and I don’t want to embed systems that rely on it, if there’s a better solution for low polys.

Maybe I should explain this better from the start:

1> I create all my buildings, each in their own level instance. It’s the easiest workflow I’ve found so far, because it’s very easy to add/change/modify at any time. Base floor is 1 instance, roof level is a 2nd instance.

2> My original system was merging the meshes of the roof layers, down to 1 single static mesh for the roof. Then I easily could fade out the entire roof in 1 step using a material parameter. Simple and effective, but low (or poor) accuracy/performance for lumen. However, the draw calls are super efficient since it knocks it down to 1 material per call, no matter what each wall piece is.

3> I then moved to using packed level actors instead, except I have to loop through all the instanced meshes to fade them out. Possibly not as efficient, but it does work. PLA should technically be more efficient and performant for lumen but requires nanite. This removes steps from the workflow, as the PLA automatically can regenerate, which is nice. A major downside is I can’t use my special actor blueprints inside them, because you can only use static meshes. These blueprints help me select/align various combinations of windows and rooves very fast and easy. It makes me sad they can’t be used in the PLA.

4> This is where I am now. I’m almost thinking of just dumping the PLAs, and giving up and accepting that the single merged roof will have less accurate/performant lumen. Unless there’s a better workflow for using the level instance of the roof differently.

The lazy route would be to just use regular level instance (no PLA) for everything, including the roof, except there’s no way to smoothly fade it out.

1 Like

Actually, wait a sec… the final meshes in the blueprints do get merged into the PLA… even tho it throws errors and says it failed. But it looks fine in the PLA itself. I guess you could just ignore the errors if you wanted a blueprint tool to stamp itself inside the PLA?

that’d certainly helps thanks.

inside the PLA? i’m not sure. i have blueprints and complex and dynamic actors inside a PLA, and it works fine. it just gets converted to a HISM once i save it. but inside the pla i can throw almost whatever and it picks the static mesh components. i can even run special commands that add/remove/change static meshes inside those blueprints.

what errors? i don’t get any errors. i think you could get those fixed.

i’m not sure what you’re saying. but there is a way. i’ve posted here before someone wanted to loop through the actors in a level instance. and it’s possible.

also, i was going to recommend material parameter collections. but i assume you know these already. and if you load multiple levels, you might want to fade the loading thing but not the already loaded ones.

I think the red errors were from an editor-only marker, which I don’t really need. But the regular window utility actors are just throwing yellow warnings. I suppose I can just ignore those. Either way, both cases the PLA did generate, and looks fine. So that’s one more bonus for PLAs if it does work with blueprints :+1:

To be clear, I mean fading the roof from 1.0 to 0.0 alpha using the material temporal dither. Many games use dithering to fade the roof, including large titles like Diablo 4.

I’ve been trying to research how to get level instances to change materials, and haven’t come up anything that works perfect yet. And in realtime, I believe this changes all of the instances? (which wouldnt be good.) This technically is another topic though, although I would gladly accept any advice for this too.

I figure worst case, I can just create duplicate buildings with different materials for variants. My main focus was what to do permanently for PLAs and nanite with lumen, and how it effects my workflow. The non-nanite roof parts clearly show no surface cache being generated… so I’m forced to at least use nanite meshes for roof pieces.

If I’m going to use lumen, I’d like to leverage it as best as possible, without any major performance downsides. And part of this, like I said, is for workflow too.

1 Like

I know I didn’t want to go semi-off-topic, but I did want to mention the roof thing and MPC in a little more detail, because this is partly why I was moving towards PLA/nanite:

MPC’s can work for fading the roof off anything (even in a level instance), but the main problem I found was I wanted only the roof of the building you were under to fade out. Not other nearby buildings (which have separate triggers.) If you toggle an MPC, it would do everything available at once.

However, I was thinking… I wonder if 2 ID numbers were set in the parameters could work..

1> Loop through roof pieces on load, assign a parameter that assigned an ID number based on their tag/instance. This is set ahead of time, per building.

2> When you need to fade out, you send an MPC for the fade % and a 2nd ID. If it matches the preset ID, then process the opacity mask.

At first I thought this would require a branch (which wasn’t working) - however I think you can use math to determine if the numbers match.

1.0 - abs(sign(a - b))

I’ll have to try this

It works. No branches, and the shader complexity didn’t move. I’m not sure the clamp is really needed, but anything that doesn’t equal the 2 ID numbers, gets forced to 1.0 opacity.

This at least gives me the option of possibly dumping nanite and PLA, and just using plain level instances. The draw calls (probably?) increase, but no nanite meshes needed. I’m not sure if it cancels out or not, but it would make workflow cleaner with plain level instances.

1 Like

This isn’t exactly super accurate results, but I tried some rough tests.

220 PLA rooves

vs

220 level instances (same exact roof pieces - probably 70 meshes, but inside a different plain level instance)

  • Both drop down to similar framerate range.
  • PLA has significantly less draw calls (obviously)
  • The PLA shows almost 40% more GPU time, the LI uses maybe 20% more CPU
  • GPU usage on the card spikes to 95% with the PLAs, and 70% with LIs
  • CPU usage is 37% with the PLA, 50% with LIs

I’m confused why the GPU is doing worse with PLA vs plain LI. It’s the same exact meshes per object, and the PLA has them batched better. The framerates are close enough, that the comparisons are running the same amount of game loops each. Unless lumen is performing better with the smaller pieces inside the LI (which require more draw calls.)

This test is partially flawed tho, because there would never need to be 200 instanced batches of the same meshes on the screen in an overhead game.

In a real game setting, with only 1-3 buildings on the screen at a time, is probably so insignificant in difference, that I should go with the better workflow option. I just don’t want to pump out 100 buildings over months of work, and later on find out I should have done it differently.

i would assume the PLA has to sort out the materials and shading on the gpu and has more polygons to sort the depth for the pre pass. that is raster bound and all happening on the gpu. the LI has to compute a larger amount of instance transforms and presorting the depth on the cpu. but renders faster with the presorted data. less depth comparisons and no material sorting on the gpu. technical difference.

When I disable nanite on the roof piece meshes, and regenerate the PLA, the draw calls are now 3x higher than the LI, and the LI doesn’t change at all. The performance is slightly worse on PLAs now.

It’s like it’s not even batching anything without nanite meshes for the PLA. I had thought it said PLAs “can be used with or without nanite”, but this feels like PLA is pointless without nanite, especially when non-nanite PLA generates no surface cache.

I’m thinking the simple route of just using plain level instances is the not only slightly better on average, but easier to work with.

i wonder if there’s a hidden factor in that test. as far as i understand, the PLA compiles the geometry into one HISM, so there shouldn’t be any overhead.

also testing the same mesh with multiple instances is kind of weird. i would test multiple different meshes, as that’s the challenging thing for instancing.

tangential. there is one case in which more gpu usage is better, which is better gpu utilization. though if you’re not having framedrops, then maybe that’s not the case. (e.g. if your cpu is bottlenecking, the gpu might drop usage, or if the usage is inefficient there might be many gaps in execution)

also remember that ue has autoinstancing, it automatically batches meshes in certain cases which i don’t recall specifically.

Since 5.5, it’s supposed to try to batch what it thinks might help automatically, and I always assumed it meant all the individual static meshes that say “(Instance)”

The regular level instances are definitely batching automatically, because those objects would have been 15,000 draw calls, but it was doing it in 1,000.

And yeah, like I said earlier, it wasn’t really a good test, because in a real game, I’d never have 200 of the same level instance. However, it was interesting to see. It gives me a rough idea of how well the regular LI can even handle things.

1 Like