While checking for some performance problems we noticed that the foliage system is not kicking properly the lods.
While a standard staticmesh is just rendered 1 time (1 draw call) and 2 times for a short amount of time during the dithered LODtransition, we noticed that most of the time all the lods are premanently rendered when using the same mesh but with the foliage system.You don’t notice the problem visually as the dithered LODtransition make all the pixel transparent, so only one lod is visible, the others are still rendered but transparent, so not visible.
It seems that this problem is causing big performances problems, as most of the time all the 4 lods of every objects are rendered (making the shader complexity on screen quite high for all vegetation using masked opacity,and multiplying unnecessarily the amount of draw calls by 4, and incresing the amount of polygons on screen highly as all lods are rendered).
If I say most of the time is because if I go very far away from an object, it still kick after a distance the first lods.
To not have all the 4 lods rendered at the same time we have to set a huge gap for the “screen size” distance of the lods. The default values (1 for lod 0, 0.25 for lod 1, 0.125 for lod 2, 0.08 for lod3) shows the problem.If I set a much bigger gap like (1 for lod0, 0.1 for lod1, 0.01 for lod2, 0.001 for lod3), it minimize then the amount of lod rendered at the same time, but this is not a proper fix at it forces us to set the lowpoly lods way to far from the camera. It look likes the system render all the lods present in a screen size range between for example 1 and 0.01 (i.e if the lods of an object use values for screen size in this range they will all be rendered simultneously).
Is there a option we missed to fix this issue or is it a real bug?
As mentioned before, the system is behaving correctly with a standard staticmesh, the problem occurs only with the foliage system. But as the foliage system was supposed to provide some performance boost by minimizing the amount of draw calls, we used the foliage system not only for bushes or trees, but also rocks, or any small objects which are placed quite often in our scenes.
There was a change made recently with the release of 4.8 that can explain this behavior. Foliage no longer uses instances clustered into multiple components, and all instances of a particular static mesh are rendered in a single draw call from a single HierarchicalInstancedStaticMeshComponent.
When you paint instances with the foliage tool, the tool finds the Z location matching the surface of the heightfield and adds an appropriate transform to move the instance to that location. In the vertex shader, an ObjectPositionWS material node should evaluate to that same world position for each instance.
Edit: I have been running some test in my own project and was able to confirm what you are reporting. Before I write the bug report for this issue, are there any other notes or points of interest which you can add to the robustness of the report?
Thanks for the fast reply.
I don’t have much more to add to help for the issue report. Perhaps one thing to help reproducing the problem, another way to show it visually is to deactivate in the material the dithered LODtransition. Then you see clearly that for one foliage instance, all the lods of the instance are displayed simultaneously instead of only one. The correct behaviour in this case should be to see a hard switch between the lod when moving the camera, but it should never display more than one lod at a time for a single instance.
Thanks, I know, the problem is not visually, the problem is about performances. The Dithered LOD transition is hiding the problem that all lods are rendered, but this affects the performances quite a lot because it draws all the lods ,which adds draw calls and a lot of polygons, the opposite intention of a lod system, and also there’s a lot of overdraw as the Dithered LOD transition jsut makes the pixel transarent, they are still rendered.
With the dithered LOD enabled, I believe the behavior and performance is exactly as it was in 4.8. It’s just that we added the ability to use it for regular staticmeshes as well which is why the flag was added.
The dithered transition will save pixel shader time and overdraw because the shader can early out immediately and before any textures are fetched.
Yes, the problem was already there in 4.8 (but it was not visible as the dithered LOD was applied automatically)
Ok for the overdraw, that’s true, with the dithered LOD transition it removes the overdraw problem.
I’ll enter a new feature request for 4.10 to allow you to disable LOD dithered transitions for foliage completely. I think this would have been the better fix for the 4.9 issue with both LODs rendering simultaneously, but unfortunately it’s too late now.
I added ticket UE-20650 with this request for 4.10
Just to be sure we are on the same page, currently, we don’t have only 2 lods displayed at the same time but even 4 sometimes (lod0,lod1,lod2,lod3). On a standard staticmesh, the dithered LOD transition displays 2 lods during the transition (and it’s normal), but it’s only during the transition. With the foliage system it diplays permanently and not only 2 lods, but up to 4 lods simultanouesly.
Is it what you understood from my original question?
For HierarchicalInstancedStaticMeshComponents we allocate instances into groups nearby of instances in a BSP tree node. A typical node might have a few hundred instances in it. The node is the smallest unit we can perform foliage culling on. The fact that we don’t consider each instance is what makes instanced static meshes quick to render.
Anyway, when it comes to LOD transitions, for each we look at the bounding box around each node and we calculate which LOD regions and transition planes the node intersects. And we then render all of that particular node’s instances multiple times, once for each LOD affected.
Now for each instance in the vertex shader we calculate whether the instance is in one LOD or the other or in transition. If it’s in transition we do the dithering effect which will save fill rate for alternate pixels. If it’s completely in another LOD, we set all the vertex positions to be (0,0,0) behind the near clipping plane which will cause all of the instances’ triangles to be culled before they’re rasterized. Now in 4.9.0 the shader flags were incorrectly setup by default, so none of the culling behavior was triggered and it seemed if all the work would always be done for multiple LODs and the dithering was just hiding it.
If your LOD transitions are setup to occur at relatively close distances (smaller than the typical node size) and you don’t have many instances, it’s possible you would see it rendering all 4 of your LODs simultaneously. If you set them further out your nodes will intersect fewer LOD regions and you shouldn’t see more than 2 LODs rendering at once when the dithering flag is disabled.
I just closed that feature request ticket - after fully researching to the way we do LOD transitions on a per-node basis I realize it’s not possible to anything more efficient unless we have entire nodes changing LOD at the same time. We had that long ago and it was so ugly it made LODs useless.
How does current performance with dithering enabled compare to the old system of LODs popping out per node?
I have scenes with many thousands of instances and their LOD transition distances compare to those found in the Kite Demo - is that not a typical use case? This issue of LOD overlapping makes these levels completely unplayable with billboards rendering so close to the camera. One can hardly see 10 meters in a populated forest. It’s just a wall of billboards intersecting 3D LODs.
I guess I want to know if you are judging performance under the assumption that this method is viable without dithering?
We never had per-node pop transitions. We did have them prior to 4.7 which had fixed clusters instead of a BSP tree, and had much worse performance than the hierarchical system used in 4.7 and onwards. We developed the current system with dithered Lod transitions to support millions of instances in the kite demo.
The behavior and performance in 4.9 with the checkbox enabled is identical to 4.7, 4.8 and the Kite Demo.