Shader optimizations (skipping AND operands etc.)

Hello. Does UE4 apply some shader optimizations I could utilize in cases like this?

](filedata/fetch?id=1870675&d=1615569128)
Edit: I realized I haven’t made it clear: Condition B in this screenshot is just gibberish (except for the end where I compress it into a 0/1 value) so just imagine any expensive operations that calculate an actual mask (Condition B) within the allowed range (Condition A).

Let’s say I need to calculate a mask based on two conditions. Condition A is some basic range check. Condition B is more expensive to calculate.

This multiply on the right is sort of like an “if x AND y”, and I think in programming it’s often optimized so that if x is false then y is not even calculated. Do similar rules apply to shaders in UE4?

no, it doesn’t optimize anything based on conditions. not IF x AND y, and not even IF x ELSE z

old discussion here: Dynamic flow control in materials - Unreal Engine Forums

A Parameter Switch does it natively though. So you don’t necessarily need to write a custom.

Also. Your math seems wrong.
0x1 is 0, not 1. so essentially the effect is always off unless your if isn’t toggled at the same time.

if you just throw a boolean Parameter switch in there then you just get an option in every instance that actually recomputes the material each time.
which is basically your optimization. If you set it to off. It’s not compiled.

You can do similar with an IF inside a custom node.
actually. therea a particular way to do it. You can find it in the forum thread on here in which people complain of the fact that the default IF node loads both branches…

a parameter switch is static, while his material is clearly intended to do a dynamic condition (based on distance).
moreover, a parameter switch acts on the whole material, while his material is clearly intended to work on some pixels (those corresponding to the world position close to a defined location) and not others (those outside the range).
his material is a post process so it’s obvious there’s a need to have it enabled all the time (otherwise… well just remove the postprocess material entirely)
so not really sure how your suggestion helps in this case (I believe it does not help at all)

the particular discussion/complain thread is the one I already linked above :wink:

To optimize this the way you want:
-Replace the lerp with an if
-Plug mask into A>B and the purple node into A==B
-Set the A input to 1 and plug Condition A’s if result into B
-Plug Condition B’s floor result into A<B in Condition A’s if node
Extras (general shader optimizations):
-Replace the pow with a multiply and plug vector length into both A & B
-Replace distance with distanceSquared (not a node, so you’ll have to make it yourself; it’s just the distance formula without the sqrt) and in blueprints (not the material) square the range parameter.

Improvements:
-Lerp and multiply are no longer needed, reducing the math that needs to be done.
-Since branching (if) is used, the shader can take advantage of branch optimization (only executing one branch, which is what you were asking about). Although, from Chosker’s link, this may not happen.
Extras:
-When squaring a number, multiplying it by itself is faster than pow(n, 2).
-DistanceSquared removes the sqrt, requiring one less calculation. As long as the exact distance isn’t needed (as is the case with distance comparisons), the result is identical to having the sqrt. The range parameter needs to be squared to take into account the missing sqrt, but since it’s done in blueprints, it’s only squared once on the CPU; plus, squaring is not an expensive operation (just multiply the number with itself).

Explanation:
The reason this works is since all conditions need to be true, it can be thought of as a line of conditions. As we go down the line, if any condition is false, we stop and return false (0). If we reach the end of the line with all conditions being true, we return true (1). Since all conditions return either a true or false, instead of combining them and returning the result, we can just return the result of the false condition.

For example: for two conditions, the possible cases are:
A = false -> return A (false)
A = true -> B = false -> return B (false)
A = true -> B = true -> return B (true)

The way you have it written, it’s not a condition, but a math operation (since you use multiply). So chances are it’s always going to calculate both sides. It may be smart enough to know not to calculate Condition B if Condition A is 0 (since anything times 0 is 0), but I wouldn’t count on it.
Also, you plug that result into a lerp, which is another math operation. So once again, it’s going to calculate both inputs (A and B) unless it’s smart enough to only do one side when alpha is 0 or 1.
And because you’re using math to emulate a logical operation (a branch), it’s, if anything, going to be more expensive than just using an if statement since the GPU has to crunch numbers just to get a 0 or 1.

So, with the way you’ve written it, zero optimizations are (likely) going to happen; it’s very likely that all nodes are going to be executed. Using if statements, there may be a worry about both branches being executed; but even then, it’s still less expensive since there is no multiply or lerp being used, reducing the math that needs to be done.

Also, does the inverse transform matrix have any input? If not, Condition B is always going to return the same value (0), which is what MostHost LA pointed out.

I still have less of a load between using the engine’s if and a custom node.
this is on a landscape material, but the difference is all encompassing despite the paint being identical and just the material being different.
Despite what the discussion points out. And I’m talking approx 15fps of a difference. So obviously the branches are getting properly excluded.

That said, no idea if this can translate to a PP material.

Another thing I’m not sure on…
Can you push a custom UV in a PP material to push the calculation load onto the gfx?
OR does the PP material always run on the gfx anyway making that not something you need to do?

Been a while since I played with pp effect…

Yes @Chosker you are correct, static switches don’t really help in my case, thank you for clarifying that in my absence. Also, I believe this issue would also be valid for decals or regular mesh materials, not just a Post Process.

Thank you for the link, I will dig into it. Also, in the meantime, I asked the same question on Asher Zhu’s Discord, where Ryan DowlingSoka from The Coalition provided an exhaustive explanation, with tl;dr being (with my limited understanding) basically that there’s no easy/obvious/platform-agnostic way to do it either in the graph or in HLSL with guarantee of it being optimized (Condition B not being executed if not necessary). I may ask him for permission to repost his explanation here, as it is a gold mine.

Also, I’ve edited my original post to clarify that Condition B in the screenshot is just gibberish; imagine any expensive calculations in there that actually make sense :slight_smile:


Thank you @midgunner66, I will study your solution and try to cross-reference it with Ryan’s explanations (which also included branching) as I’m still pretty new to the deeper mechanisms of shaders. And yes, I sometimes offset calculations that only need to be done once per frame to BPs and feed them via parameters, but I also know in some cases this is already optimized in GPU (these are called constants, right?) so I wasn’t sure if by offsetting it to CPU I’m not actually harming performance.


Hmm, I don’t see where it is wrong… Like I wrote in the graph’s comments, both conditions need to return 1 for the purple effect to pass. Condition A returns 1 when the given pixel is within defined range from SomeLocation. Condition B returns 1 when the pixel satisfies some more complicated condition (it could be a more complex procedural pattern, for example).

Edit: oh and thank you [USER=“3140864”]MostHost LA[/USER] for your custom node IF tip. I’ve also seen it in the linked thread, still trying to make some sense of this with my very limited understanding of HLSL and how GPUs work.

Edit2: Ryan kindly allowed me to post his explanation, so here it is:

](filedata/fetch?id=1871253&d=1615748049)

Please disregard the “messed up Condition B” part, as the screenshot here on the UE4 forums is already fixed. Also, here’s my “bonus question” screenshot from the discussion:

@midgunner66 so I followed your instructions and realized I used to do it that way, until a few years back I (mis)read somewhere that IFs are bad, and also decided that multiplying parts of a mask (like in my initial example) is more readable and easier to maintain for me than the IF sequence.

Also, wouldn’t this be faster? Change A from 1 to 0 in the final IF, which allows disconnecting A==B. Because connecting it makes the material compile in a different way, correct?

[Animated GIF]
UE4 If Optimization.gif

But yeah, even you said judging from Chosker’s link this may not work either way. I skimmed through that thread and I must say it’s a bit overwhelming… If even you guys can’t agree/be sure how this stuff works…

Well, if you are using the value in the IF as a test the value is always evaluated.

“Hey unreal, check if A is >=< to B but don’t solve B”
Is basically what you are asking. I wish there was some magic for that. Would solve a LOT of issues.

… forum ate half the message …

in your previous branch, the calculation if branching was a thing would cost less based on the result of the distance check.

And are you sure you need an IF at the end and not a lerp?
sudden color change vs gradual.

Not exactly…

Yes, now that was the point. The optimization not being “but don’t solve B” but rather “but be smart about solving B”. I mean I just followed midgunner’s advice… You’re saying it won’t work, right? Both parts of the first IF (both Condition A and B) will be calculated?

In this case yes, on/off. Think about AoE visualization effects, cones of view etc. But this post was more about trying to understand the general rules, not solving one particular case.

well, ai mean, for what it costs, throw it in a custom node instead of an if.
that way if it works it costs less. And if it doesn’t it doesn’t change a thing…

You can profile both versions to see if you get a benefit, but realistically its kinda pointless if the calculation happens somewhere based on distance, you’ll always see the heavy calculation running anyway.
I don’t think “more” or “less” of it makes much of a difference. BUT, you could try really heavy noise nodes to force a drastic difference to bench it.
just to see if changing the distance value causes massive increase or not.

For sudden aoe visualizations stuff like nightVision. I would really just fade to black, swap PP material, and fade back in.
so the cost is definitely not always part of calculations.
But yes, if has its use for cones of view and whatnot injected into the PP.

My answer comes strictly from general shader programming (specifically GLSL). I agree with everything RyanDowling.Soka said: your best bet for optimization is to write it out in a custom node (HLSL). ue4 only converts the node graph to an HLSL shader, it doesn’t actually compile it for your GPU. Because of this, ue4 may not write the HLSL code in the most optimal way. If you write it by hand, you’ll know what’s actually being done.

This is both true and false: it gets complicated, but the true answer is it depends on what’s inside the branches. It’s a lot of tl;dr, so I’m gonna keep it simple (leaving out some stuff).
When a group of neighboring pixels evaluate an if statement, if some pixels go to the true branch, and the others go to the false branch, both branches need to be executed. Because GPUs use SIMD (single instruction, multiple data), the branches will be executed in serial (the first branch is done first, then the second), rather than in parallel (both branches done at the same time). This makes the total cost of the if statement = cost of branch A + cost of branch B; i.e. they combine to become one chunk of code rather than two separate ones. However, this is not necessarily bad. If the content of both branches is small, than the combined content is small.

But both branches being evaluated is the worst case scenario. In some cases, all pixels may go to the same branch. In this case, since all pixels go to one branch, only one branch needs to be executed; the other branch is skipped. An example of this is when the condition of the if statement only contains constants or global/shared variables (e.g. game time < 10). Because there are no variables that vary by pixel, the result will be the same for all pixels.

However, even in a case where variables can vary by pixel (like world position), they may still go to the same branch. An example of this would be distance(worldPosition, someLocation) < 1000. Because neighboring pixels are usually close by in the scene itself, their worldPositions will be similar. someLocation is a parameter we set, so it’s always the same. Since the worldPositions will be similar, and someLocation will always be the same for all pixels, the distance result will be similar. 1000 is a constant, plus it is a large number, and so there is a lot of range for values.
Taking all this into account, we can assume that most neighboring pixels will evaluate to the same branch. Not ALL pixels will do this (in the case of being on the edge of a mesh), but in cases of gradual change (like across a floor), this will happen.

Once again, it gets complicated (because it changes on a case-by-case basis), but in general:
-If statements only evaluate both branches if neighboring pixels go to different branches.
-If statements are only bad if they will execute a lot of code at runtime (even if you see a lot of code in a branch, it doesn’t mean it’s actually be executed).

In terms of readability, by using a math node, other people looking at it will see math. They will have to figure out what’s actually going on to realize that your actually just doing an if statement. By using an if statement, you explicitly say “this is an if statement”. Unless it improves performance by a lot, always write things out explicitly for better readability.

It really doesn’t matter how this is setup, just that it logically works. You can even use 0.5 for the comparison: since the input will either be 0 or 1, it will never be equal to 0.5, so you only need to use the A>B and A<B inputs. I don’t think the performance will vary here at all (unless it’s a ue4 thing).

Variation 2 is way worse. Not only is there more math, but there is a division (which is slower than multiply). They both logically do the same thing, but Variation 1 is straight to the point, whereas Variation 2 emulates it with math. If you were doing this in a situation where you only can use math operations (like on Desmos), you would use Variation 2.
And if you think about it, saturate uses if statements, too (n < 0 ? return 0; else n > 1 ? return 1), so you still technically have an if statement in there.

At the end of the day, nothing beats just testing and benchmarking it. Instead of spending a lot of time researching and speculating, you can just test it upfront (like what MostHost LA suggested). I find that just testing things upfront is faster than trying to find the solution online. The only caveat is things can change from GPU to GPU, so what works on your GPU may work differently on another GPU.

2 Likes

Hmm, I just started watching Epic’s stream about the Custom node. They show the documentation page where it warns that using the Custom node prevents constant folding. I checked and the warning is still there

I’m at a smaller studio and so far, in the past 4 years, I was the only person doing our materials, and in case someone ever needs to take over I always comment everything and keep it tidy, so if there is a node group labeled e.g. “Range Mask 0/1”, and another labeled “Stencil Test 0/1”, both leading into a multiply, I think it’s readable enough. But I get your point :slight_smile: I’ll probably switch back to IFs.

I remember vaguely that if A==B is connected, the material is converted into HLSL differently (probably because the equals condition has a threshold).

UE4 If threshold.png

I thought Saturate would be faster than a full IF…Still I guess you make a valid point about the division.

UE4 Saturate.png

Yeah actually I’ve been testing stuff a lot lately, doing small builds and writing averaged performance measurements down for comparison, while learning how to optimize our project in terms of stuff like shader complexity, LODs, dynamic shadow casting settings, culling etc. As there’s no clear, universal guidelines fitting all projects, I arrived at the same conclusion as you: just test and find out. Even did it once with HLSL, actually: I was curious if the Clamp(0,1) node would compile into the same code as Saturate, so I checked in RenderDoc and it did. Once I changed (0,1) to something else, it produced more code.

Still, this topic can be a bit overwhelming, and my testing methodology could be flawed so I preferred to ask first to gain some foundations rather than to wander in the dark. So thank you for the explanation, I just started educating myself on fragment shaders etc. and it’s slowly starting to make sense.

Really appreciate all your answers!

1 Like

Yeah, that’s true. I don’t know how much savings you actually get from constant unfolding since it only works on nodes that don’t change per-pixel.
Also, I think you can do constant unfolding by hand if you wanted to: just compute all the constant parts of your material in blueprints, send it to the material using a parameter (or something better), and plug it into your custom expression.

I don’t know much about what unreal does under the hood for materials, and in most cases, I don’t worry. But I think you’re right about the equals condition (since that would require more math). Try setting the equals condition to 0 to see if that improves anything.

The if statement itself is not what makes if statements problematic: it’s what you do **inside the branches. **Whether you use a saturate or an if statement, you’re still going to be doing something based on the result. If you’re doing a lot of stuff in both branches, in the worst case scenario (both branches are done), you’ll end up paying the price of both branches. However, if you had an if statement that did nothing, you end up paying nothing.
Edit: here’s a super short explanation on thread divergence (when pixels take difference branches of an if statement). This is what makes if statements problematic. This is a better explanation, but on loops.

Edit 2: saturate isn’t bad; my “if you think about it” thing was just generalizing it to code and is probably not how it actually works. the main problem was the division.

If you look at a lot of projects, tutorials, examples, etc., you’ll notice material graphs can get pretty big, yet they still run fast. So I really wouldn’t worry about optimizing anything unless you really need to.

Though, it’s important to keep in mind that it’s a material, not a shader. The material is only meant to compute the colors for the gbuffer (hence the diffuse, specular, normal, etc. inputs), and the shader is meant to do the actual lighting calculation (hence the “shading model” property in the details panel). If you’re doing calculations in a material that do more than just calculate color, you should probably be doing it in a shader instead (or blueprints).

Thank you again for further clarifications. I don’t really get this last part though, could you plese clarify? Assuming our calculations need to be done per pixel, we can put BPs aside. So what’s the difference between doing it “in a material” and “in a shader”? What would be an example of what you’re warning against? I’d think everything I do in a material graph is to compute colors. Or, well, shapes, like when masking a view cone effect, or determining a mask for procedurally blending two textures in a material. But it all ends up as a color (value) in a GBuffer texture. What would be a counter-example?

I think I was just being too low-level, lol. I was basically saying what you’re saying: it computes colors for the gbuffer. I was trying to emphasize that materials are meant only for colors, not lighting, and to keep lighting calculations (e.g. like trying to fake lighting) out of the material. This would explain why materials are very stripped down from blueprints (no exec pins, no loops, no and/or nodes, etc.), whereas shaders support all of that.
Kinda off topic, but applicable to optimization, I guess.

They should still allow for loops. Especially because vertex factory is a mess to work with.

Having a dedicated node called something “per vertex loop” that just defines the math applied to each vertex would be much better than having to do guess work, or use texture samples to define sine based animations.

Having the feature would also allow for checking an index value or do other things that currently are just more complicated because of the lack of this basic functionality.

Should that be in a shader and not a material? Maybe.
I see it more as a material thing, since displacement is commonly done in materials.
and since you probably still use texture samples on it.

However we are totally digressing from the original optimization point of the question with branching and all