Optimizing the shader: when to replace If statements by a Lerp?

Hello,

I am currently working on a shader that became quite long to compile (between 5 to 15mn). At first I though it was due to an over-use of functions (I have functions using functions using functions… and this is pretty useful to maintain the code and reuse some parts).

After some comparison with an older version of the shader is appears it is due to an over-use of the If statements. I replace some functions by simpler code and it is already a lot shorter to compile. Beside I don’t have the Compiling small window appearing anymore.

I still have some If statements here and there though. They are pretty useful because easy to read. I could replace them but I am not sure it is always needed. After some reading on the web it looks like in many cases the compiler is able to do a good optimization for the code with it, and in some more complex cases it cannot.

Hence my question: **When is it interesting to replace the If nodes by a lerp (or a max, or something else more analytic)?
**
Beside how can I check the results? Is there a profiler or a tool that would tell me that a branch has been created in the shader?

Thank you

In terms of rendering performance, It depends on if the inputs to A and B are static or dynamic. Ie, if either one uses a Scalar parameter, it will be dynamic and every pixel will be calculating all the input branches. In that case, the IF will perform very similar to using a clamped Ceil mask with a lerp. If all the inputs are static, then in theory the shader can actually be smart enough to only run the necessary part of the branch. In practice it seems to pretty much always run both branches though.

In terms of shader compiles, I don’t have a good answer. I am not sure how the shader compiler evaluates branches and what the cost there is.

I can suggest though, that if possible you use Static Switches for anything that can be turned off completely. That will definitely help your compile times by creating more permutations and only computing the used ones.

Hi Ryan,

Thanks for the answer.

I still have some trouble to understand this notion of Static Vs Dynamic.
For example, as shown on the picture below, my input is a float used as a boolean: if it is strictly inferior to 1.0 I do one thing, otherwise (so if it is true) I do another.
Should I use a static boolean instead?

If you are changing the value of that float at runtime, then it needs to be how it is (or an IF)… but if you are pre-setting the value in a material instance and then never changing it at runtime, it is a perfect candidate to replace with a Static Switch. In that case you either use a “Static Switch Parameter”. and that node IS the bool that shows up in material instances… then you just hook up true/false… or you can use a “Static Switch” and use a “staticboolparameter”. They both do the same thing but the latter allows the switches to be inside material functions and the bools to be parameters in your material.

Branch performance is bit more involved that that. GPU only need to calculate both sides if current wavefront(group of pixels. 8x8) has any divergent. If all pixels choose same path then only that path need to be calculated. Random input is then bad but input that has some sort of locality is good. Like if this position is under the sea level is pretty performant branch because usually neighbor pixels are also near in world space.

What I have noticed that using more shader permutations is bad for compile times. Amount of possible shaders is x * 2^amountOfStaticSwitches. Where x is number of different passes.(shadows, depth only, gbuffer etc). So adding switches skyrocket numbers quickly. Like in this unity blog post when they suddendly did have 1.9Million shader variants somehow. But you get that amount of variants just by using 20 tick boxes.

While that is technically true, in testing it seems difficult to achieve. Anytime you use a scalar parameter it seems to prevent it from gaining any performance benefit from branching. You can verify that by making a huge complex branch that should always be bypassed by using a Scalar at 0 for instance. The cost of the extra instructions will most likely show up still.

And it should only have to compile permutations that are actually used. It does not compile every permutation of static switches.

Scalar parameter branching should be uniform branching which is super fast on all current gen hardware. But it has one big problem. Shader still has to reserve registers on worst case situation. Register pressure reduce amount of wavefront occupancy which is one of the bottlenecks on current gen consoles.

Is there any good information how UE4 actually handles shader compilation and permutations?

Ok, well according to both your feedbacks I’ll try to use the If, the Lerp and the Static Bool wisely (at least as much as I can… :D)

The worse case I’ve been through with that shader was with If nodes everywhere. It looked like all the possible permutations were generated and this created such a huge file to compile that, in fact, I wasn’t able to compile at all. I had the following message:

Error [SM5] warning: Line number “32768” got beyond range
MaterialFloat Local31559 = floor(Local31558);
from …/…/…/Engine/Shaders/ConvertToUniformMesh.usf: 8: #include “Material.usf”

It is far better now.

Hmmm, I was under the impression that an IF node in materials is not actually a branching if statement and instead uses a trick similar to these on here?

If not, these are some good methods for optimising.

Just looked the generated hlsl. It seems to always emit ternary branches. eg. result = conditional ? optionA : optionB.

If you use middle option too then its adds one ternary more.

#Boolean_Math_Operators

It seems that hlsl spec require ternary to always evaluate both sides. What a bummer. I think I need to code an actual if statement myself then.

Ouch! Okay, ****. I just checked HLSLMaterialTranslater.h, yeah it is a ternary operator… :frowning:

I guess it’s time to create some material functions!

It seems that you cannot create flexible and actual dynamic branch just using material function. Maybe there should be tick box inside if node that would actually emit true branch instead of ternary. On other hand driver compiler might actually just ignore hlsl spec if there are no any side effects from not evaluating both sides.

Ryan, I would be interested to know how the If node is implemented. Do you think you can get some information on that?
I know the code is freely accessible but I am afraid it would be a bit out of my reach.

Beside I have another question for you. What do you think about having some loop nodes? Like a For and a While. It would be so much useful.

Thx