Material Optimization

anonymous_user_9308b33e · March 23, 2016, 4:22pm

I looked at a bunch of sources last night and I’m hoping someone can confirm or clarify my understanding of this.

If a material has dynamic branches, such as an if controlled by a parameter, at runtime all branches will execute every frame. Is this correct?

If a material uses things like static switches, the upstream code will be optimized out at compile time, however the compiler will compile a permutation of the shader for every combination of switches, increasing the compilation time significantly. For example, a material containing 4 static switches will actually compile 16 shaders whether they are used or not. Is this correct?

I assume this also applies to things like Post Process materials as well?

Any other useful links or information about shader optimization would be appreciated. I’m seeing a lot of information from various sources and it’s not clear what information is accurate. I want to make a pack for the community and I don’t want it to be unusable garbage. Currently trying to decide whether it’s better to make uber materials or many variations on materials with increasingly more features.

RyanB · March 23, 2016, 4:34pm

If dynamic branching is controlled JUST by a parameter, then it should be fairly fast these days. It is when one side of the comparison is a value that varies per pixel that the gpu may end up just running both sides for each pixel. In that case, the amount of ‘savings’ from the branch is almost zero.

You have the right idea about shader permutations.

I look forward to seeing your compiled guide of performance tips.

Heres one that isn’t always possible but easy to forget: do your comparison using the CustomUVs if possible (ie you dont need per pixel variation).

anonymous_user_9308b33e · March 23, 2016, 4:46pm

Thanks a lot. Really appreciate hearing from the source on this.

Sotalo · March 23, 2016, 6:48pm

When I made the “Perfect Tile System,” I had a TON of switches for using more expensive VS. less expensive features. The system is scaleable from a fun, cheap mobile shading tool to an impressive Parallax Occlusion master brickwork shader. There are definitely going to be a lot of permutations of this shader (I think some several thousand), but the shader doesn’t compile all of those several thousand permutations if those permutations are not used. I also have mechanisms to prevent certain options from conflicting with others. For instance, I have a simple feature to remove grout lines at glancing angles for a somewhat cheap cost, and it works great with basic parallax, but this is totally unnecessary to use if you have parallax occlusion, so the option is only available when parallax occlusion is checked off.

There are a lot of different ways to optimize in the material editor. For one, the editor does not like doing the same math more than once. So if you have a more complex material, maybe something that requires Phong shading, and you need to plug the result into both an opacity and an emissive output, it is better to perform the bulk of the math for the opacity and then multiply that result for the emissive. Running two different values for each input would require performing the same phong calculation twice. Some very interesting things are almost free, like multiplying something by itself only requires one instruction, while raising something to the power of two costs four instructions. Because the deferred renderer is automatically required to calculate things like per-pixel world normals and pixel/scene depth, you can perform depth calculations and world position calculations very easily. Addition, subtraction, multiplication, and division are concatenated, but linear interpolation functions are not optimized at all! If you want a cheap halfway blend between two textures, it might be a better idea to simply add them together and divide by two than to use a lerp function. And because concatenating is very good at organizing functions, I find that the more complex your material becomes, the cheaper it becomes to make those effects work. For instance, if you use advanced vertex painting (combination of vertex and masks) to blend between two textures, that same vertex painting code can be repeated for blending roughness, metallic, and subsurface properties as well, so you’re not really saving much by choosing to blend only one property. And of course, if you can use unlit materials, fully rough materials with 0 plugged into specularity, you can save some instructions required for rendering reflections and GGX specular. Unfortunately, I do believe the reflection environment runs on fully rough materials anyways, and the only way to remove it is to use unlit materials or not have any reflections in the scene (no SSR, no reflection captures, no stationary skylight) at all.

I also noticed something interesting: the UDK method of normal blending is not actually the cheapest. In the old days, the suggested solution to blend two normal maps on top of each other was to multiply the overlaid normal by 1,1,0 and then adding that result to the bottom normals. However, because 3-vectors require more instructions than 2-vectors, it is better to mask the RG values of both normals, add them together, and then append the blue channel from the base normal for the final result. The former method required 5 instructions, but my method only requires 4 and the results are exactly the same, you just don’t have to perform any math for the blue channel. I try to use as few channels as I possibly can. I never use the RGB value to manipulate Grayscale textures.

anonymous_user_9308b33e · March 24, 2016, 1:53am

Wow, that is a ton of useful information. Thanks a lot for sharing. I recently implemented some standard lighting models like Phong using unlit materials as an exercise and was curious how much using functions like pow with large exponents might cost.

Sotalo · March 24, 2016, 2:53am

I’m pretty sure the cost for power is the same no matter what value you plug in for the exponent. But if you use static integers, power of 2, power of 3, power of 4, it’s much cheaper to just multiply the same function against itself than to use a power node. Of course, this means you can’t easily plug in a value, but if you’re going for optimization, that’s what you need to do. There is a limit to the integer size, I think all the graphics calculations are done in 16 bit. There’s an 8-digit limit to any floating values you want to use, so you can define any number to the nearest millionth, or if you use only positive integers, the nearest hundred million. The material editor starts doing goofy things after that, rounding numbers to fit the requirements.

Also, it should be said that Unreal uses linear space while computer monitors and textures use gamma space. To convert a linear gradient to the appropriate gamma space for a computer monitor, you need to raise a linear function to the power of 2.2 (a good approximation of sRGB). However, if you just multiply the gradient by itself (power of 2), you can save a few instructions and get a very similar result. My personal Phong code multiplies the Lambert diffuse by itself for a more accurate shading model.

If you find yourself saving only a few instructions for a huge headache, the optimization might not be worth the trouble. But some people tend to go to insane extremes with materials containing 500+ instructions and then wonder why the system keeps crashing. As long as you follow good practices, meaning you don’t go overboard on just one material, you’ll be fine. If you have a character, separate the skin from any metallic jewelry: the skin can be subsurface, and the jewelry can be metallic. Try not to blend too frequently: layered materials can easily get out of hand. If you can repeat code, do, because repeated functions compress very efficiently. Try not to go overboard with the textures, and use shared samplers any chance you get: landscape textures, small particle effect textures, the skybox, character textures, stuff you know will always be rendering or doesn’t cost much. This will limit draw calls at the expense of a larger memory overhead, but memory is really not a problem nowadays.

The material below is my Perfect Tile shader. It’s the most complex thing I’ve ever made in my life. Remember, only a fraction of this can possibly run at the same time. The most complex thing you can get out of this does not even compare to the complexity of some water shaders, and that’s not even considering the cost of rendering behind translucency! The material editor is just that awesome.