Foliage overdraw testing!

Just was running some tests when creating dense custom foliage to investigate the impact that overdraw has on performance. The initial plan was to make all foliage using simple tessellated rectangular cards and not worry about matching the shape to the outline of each leaf/blade (for speed). However, doing some testing I found some definite improvements when spending the time to get it as close as possible for certain LODs.

With a simple test, the cards with less overdraw had significantly better performance while having the same amount of tris per card (obvious, but might not be apparent how important it is to optimize the clear space as much as possible- could gain some valuable performance).

(Note :archviz grass LOD0- hence the high polycount to allow for proper bending in GrowFX):

I then started wondering how much this would matter regarding usage with billboards and distant LODs- would the added geometry to reduce overdraw instead of flat single planes win over the cost of more tris? The short answer- yes up close, but no far away. As soon as the instances reached a certain density, it was clear that the cost of the geometry used to reduce overdraw was outweighed by the fact that A: the meshes were too far away to give any real benefit to reducing overdraw and B: the triangle count was much higher (usually double) and this clearly loses regarding performance in large numbers. Billboards with attention to overdraw looked almost the same as simple crossplanes at distance in the overdraw visualizer, and had double the polycount with definite performance drops.

Here is one of the tests with a slightly tessellated LOD vs a single plane per blade LOD:

The results show that although there is less overdraw happening, the extra tris to reduce overdraw aren’t worth it at distance (single plane per blade wins):

The takeaway from this is to reduce overdraw wherever possible- mainly on close, more detailed geometry, but go as simple as you can on distant geometry LODs as a few extra triangles per mesh will start to add up when you have millions of them in the scene.

If anyone else has findings/information to add here, would be more than welcome, it’s a topic that is definitely very broad and doesn’t have a single solution to.

Hey Eoin,

Lush results! And thanks for the breakdown! On my work for DTG Fishing we did many experiments regarding this. Of course, we couldn’t afford to throw millions of individual blades of grass in (the idea was floated), but found in fairly large, dynamically lit scenes, less vertices were generally more performant on PC. I understand SpeedTree packs a lot of extra vertices into its meshes (uses something like 8 UV channels), i’m not sure if this was altering our results.

Also (please correct me if I’m mistaken!) - it looks like you’re using the quad overdraw debugger, which if i understand correctly is indentifying where redundant pixels are being processed as the result of thin triangles?

Thanks for doing this. I’ve been wondering about the tradeoff in this usage myself.

I wanted to add to your conclusion, I don’t think tri count is the main concern or issue, it’s when you have millions of tris that are smaller than the 2x2 pixels the GPU renders, that performance drops.

The 2nd image is pretty nice, great dense grass… but how many instances do you have to make it look good like that?

Thanks for taking the time to check into this. I’ve recently been wondering a lot about this and hadn’t had time to test it yet.

EA used something close to individual blades of grass for Rory Mcilroy, but the grass fades out into a texture at a distance. What might be best is to take a shot from above, turn that into a tiling texture, give it a bump map and a normal map, and then tile that into the distance. When the grass fades out, fade in the texture. Use parallax occlusion and pixel depth offset to give the grass real depth, and use contact shadows. That way you keeps the same complex grass look in the texture, but not on the course. It is also entirely possible to use LODs in the grass blades themselves to give the grass more polygons up close but less further away, but that method is rather complex and will incur additional draw calls. It should be said it would be much more performant to have grass cards and sheets and then combine them into small meshes so a few polygons can handle 10-12 blades of grass rather than making each blade of grass its own card. But individual blades of grass will provide proper shadowing and ambient occlusion, which is what makes 3D grass look so incredible.

For Archviz/realism it’s important to try and go much more dense than a typical game world, so I make a bunch of varied clusters in GrowFX and scatter them- with LODs and billboards/distance cull, the dense and expensive grass is only in a small area around the camera.

The goal here is to have fully detailed blades up close which quickly switch to billboards. I make a highpoly clump in GrowFX, make realtime LODs and then bake billboards of the clump which contain a big bunch of blades at once. The aim is to get the best of both worlds!

The texture trick is definitely useful for more optimal solutions- some great examples are Battlefield 1 and Hitman- they both have very sparse lawns/very little grass geo but they match the color and blend into the ground textures so well that at a glance you don’t even notice!

I’ll do some more testing with some more complex scenes in the future (trees, ferns, bushes etc!)

Billboards are more than twice as complex as just plain grass. You can double the instance count and just use static grass instead of billboard grass. If there is a way to make lower LOD models look as good as billboards, that method would have to be preferred if realtime optimization is important for your project.

Billboards (simple 4 triangle crossplane) are more complex than a static mesh with loads of individual blades (~250 tris)? Do you mean more complex in terms of overdraw or overall?

I did some testing and billboard LODs at distance beat meshes with attention to overdraw every time- at a certain point the overdraw becomes indistinguishable between the billboards and full geo meshes, so it seems much more cheap to use billboards as a distant LOD. The billboards don’t hold up at close range at all though, since looking down on them would give of course lose any sense of depth/parallax the billboards gave.

There are other options such as rendering impostors from every angle, but trying to keep it as simple as possible for now.

Here’s a video I did a while back showing the transition from full geo clumps like above, to simple crossplane billboard meshes- you can definitely spot the transition if you look closely, but it certainly looks close to the original mesh I think: [video][/video]

The material on billboards cost a lot of vertex instructions. And because the instructions are in the vertex, not the pixel shader, it does not scale well by screenspace. Having tons of billboards in the distance will be at least twice as difficult to render as multiple instances of the same vertex count. A 4-tri billboard will perform similar to a 9-tri mesh. Don’t get me wrong, it looks great, but for realtime performance it’s really not ideal.

Very interesting stuff, this is something I haven’t really dived too deep into before. Right now I’m using essentially the same exact master shader for both the geo grass and the billboard, so the instruction count should be the same- right? I’m very curious about this, so any knowledge is greatly appreciated!

From what I know, the material that performs billboard vertex transforms requires inverse tangents, and atan expressions are very expensive, doubly so because access to it requires custom code that doesn’t compress well in the material. These operations are performed for each vertex, and vertices drawn in the distance cost the same as vertices drawn up front. Also, vertex operations can occur offscreen on objects that are not effectively culled from the view while typical pixel shading is limited to the screenspace. I know UE4 has really good culling with the procedural grass tools, but the material complexity would be insane regardless. You can double the vertices onscreen, spawn more grass, and keep the same shader complexity.

One of the obvious issues of not using billboards is you can see the mesh is just a plane. But if you spawn enough meshes, the sheer density should be enough to mask any awkward angles.

Ah right- I don’t use any animated camera facing techniques/custom code for billboards- just the highpoly projected onto a crossplane works fine! Since it’s just a regular material setup on a simple 4 tri mesh, I imagine the performance would be better than lots of single blades, even with there being a bit more overdraw, which is less important at distance. From the tests I did, the billboards always had more than 25% better performance than other LODs- I can’t speak for billboards with custom functions, but like you said that would add a bunch of material complexity!