Dynamic flow control in materials

@ can’t speak for all platforms, but cross compiling to PSGL for PS4 also works.

And what do you mean the network needs to be placed in the braces? If you mean the network going into either side of the branch needs to only be used there to prevent it being executed elsewhere, then sure of course. Otherwise I’d you could elaborate I’d appreciate it.

​​​​​​
I did a test with a basic layer blend material and saw basically no performance overhead. I guess it makes sense if the texture is already loaded.

I still agree with the original point of this post in that having dynamic flow control as atomic nodes would be really helpful and avoid the guess work.
Hell, why you’re at it, multiple outputs for the custom node please :wink:

I have tried to replicate your results, but I could not do so. I have tested in both 4.25.1 and 4.22.2, and both versions give me the same result, that no branching actually happens:

You see the material has a bunch of very heavy noise nodes, to see if it works or not. I have placed a plane in the level with this material applied, and set the screen percentage to 200% to easily see any performance issues caused by the pixel shader.

No matter if the Param is 0 or 1, my Base Pass takes around 17 ms in this case. If I replace the “Param” parameter with a static “0”, the Base Pass time goes down to 0.5 ms, so that’s how it should be if it would work.

Have you done anything else to make it work for you?

It works fine for me but im not driving it with a scalar.
in my case I set the custom node to analayze RGB of the input and eliminate the pathway if no color is in use (0,0,0)
I had to rewrite the custom node a bit, each channel needs its own check.
Either way the branching compiles and does its job just fine.

I would suggest you change the custom node to check if a value is equal to rather then greater than.

Otherwise floating point precision could also be an issue.
​​​​​​
just eliminate the branch when the scalar is == 0 and see if that makes any difference…

Checking for > 0.1 is *less *likely to run into floating point precision issues than checking for == 0 with a value that’s either 0 or 1. I have also tested == 0 now though, and it makes no difference, still does not work.

What about the engine node in your case, a static switch parameter?

I agree with you that in theory checking for > is less likely to run into floating point issues.

I just don’t see any other reason why the engine would have it working with a static parameter and not with a scalar.

Could the custom node settings be at fault?
ei: its considering the scalar a vector3 instead of a scalar or something the like?

Does the dynamic branch node in the material node graph still do nothing?

That is correct.

Here is one thing I did find out about branching, which should be taken in consideration.

Now, I’m no expert, but I found this info from sources which can be trusted.

This may explain, why sometimes branching may work, sometimes not and sometimes it may cause performance to get even worse compared to a non-branched material as in some benchmark results we saw previously in the thread.

The GPU tries to do it’s work in masses. That means, unlike a CPU, which runs code generally sequentially, and can run in parallel some threads, a GPU runs rather thousands of operations in parallel, that’s why they have thousands of shader cores. These shader cores are grouped. The smaller the amount of cores in a group, the better it can perform. Depending on the hardware, some gpu’s may have each 32 or 64 shaders grouped, others however even up to 1024.

That’s why a gpu renders usually many thousands pixels in parallel.

Now the thing is, a shader can be only optimized based on all pixels which are rendered from a shader group. Let’s call this a batch.

(For clarification: The following explanation accounts also for the if in the compiled asm of the shader!)

That means, to gain a significant performance on the material/shader from branching, the calculations of all pixels in a batch must follow the same branch. If only one of those pixels will have to follow a different branch, all the branches must be processed for the whole batch. A good example of this happening is one of the benchmarks in this thread, where a branched material would be worse than the material with no branching. Because, if the branches are repeating mostly calculations, which may be shared previously with no branching, they could now be up to double as heavy.

Furthermore this means, that any pixels taking the “early” branch route, still have to wait for the “long” route to be processed too, when a batch is not using a single branch. And if the gpu has no other threads to take in the meanwhile, because it’s “waiting”, branching will lead to wasted cycles and worse framerates. That’s why a non-branched material may be effectively more efficient too, depending on the case.

That’s why gpu branching won’t work in an always-optimized-manner in the way we know it from CPU code.

It is easy to see, how a material with a lot branches, may lead to a situation which could destroy framerate.

Knowing that, this makes it obvious, that branching seems to require a lot of attention, when to use, when not, which operations should go into each branch, if the branches should be rather short and so on. If the branch condition may flip often or not be able to group many nearby pixels, the performance impact could become very unpredictable. A branched material, which may work in one case to increase performance, used in a different situation may create a performance issue in another. The differences on shader grouping on different platforms add even more to this “unknown factor” - while a 16-shader group gpu may be able to optimize a 3-branched material a lot, a 512-shader group gpu may kill framerates completely.

Based on the gpu-specific nature of branching, I can see, why the unreal engine engineers would not easily opt for adding a feature as node and leaving it only as an option inside custom expressions, seeing how it could lead to unpredictable (performance) issues in materials and extremely different results on different platforms and hard-to-debug cases. And that’s probably why the If node doesn’t utilize branching.

After all, predictability/consistency is a very important factor when creating games, especially when planning ahead with different performance budgets for different platforms.

That said, if someone wants to benchmark or test if branching is even working (like in a thread like this one), would have to test and compare results not only on their own platform, but do the same tests on all involved ones and rather with real test cases - as the results may not replicate at all on another gpu or may behave with fictive test cases very differently than with real cases even on the same gpu.

(this may be considered in small parts by some as off-topic, but still half-way in-topic and I’d like to show alternatives and such posting it with this warning)

I did research this a lot, as I’m currently creating some landscape tools and materials, and want to make them exceptional, as such I try to incorporate truely a lot of functionality in there while
having them still perform greatly. So, material performance optimizations have been a huge topic for me. I have many parts which felt to be hugely optimizable with dynamic branching, so that calculations wouldn’t be performed.

Now after figuring out the fine details about dynamic branching, however, I come to a conclusion, that this is only to be used mostly in small areas, where predictable operations will be performed within the custom expression only and when both branches together won’t add more complexity/calculations to the material (because these two branches may still be processed - see previous post!). Using it for external branch optimization - while still functional - makes it rather something to avoid in the first place. Also, I consider it now something to take in consideration only in late-stage material optimization and use it very sparingly and carefully, as it may involve a lot of testing I’d rather avoid to such extremes. It also seems to me, that it makes it something to avoid, when creating materials which can be used in unknown environments or with unknown assets.

So, I rather look in alternative ways to optimize performance for my material / landscape materials. These could be:

  • Clever combinations of instructions, which may give the same result as a “proper” calculation but lead to it with much less instructions
  • Using texture samples/maps rather than doing calculations, which return the results wanted or the opposite in some cases
  • Using runtime virtual textures to cache som of the result - remember, you can always do more calculations afterwards on top of the cached texture even within the landscape material, caching only the “predictable” part of the landscape layer which involves no camera dependend values (camera position, pixel depth…)
  • Avoiding any texture operations which are inefficient, like no-power of two texture sizes, or offsetting texture UVs
  • Moving calculations from the pixel shader to the vertex shader - may be difficult though with some very advanced landscape materials, but still considering it and looking for possibilities
  • Repeating the same nodes for the same calculations (the editor does a great job of collapsing those and avoiding unnecessary instructions) rather than a different node setup for the same calculation - functions help greatly in that regard
  • More Lerps, as a semi-branching feature, as it seems shader groups will optimize those (not 100% certain yet, but some things point to this)
  • Static switches usually do still a great job for increasing performance
  • Quality switches are a built-in branching which already works great and will optimize a material based on quality level set - example: someone asking for low quality, doesn’t need roughness calculated based on a roughness map, a heightmap and the normals, to be optimized in regard to specularity for proper reflections on distance and angle viewing.
  • **(update) **Here’s one based on how both the gpu and the editor-to-hlsl compiler work and process things: Some operations may result in better performance if “unrolled”. Take as example an iteration within a custom expression, which could be performed more efficiently, if it were in nodes, because of the way the gpu processess things. On the opposite, if similar nodes which repeat a lot and seem to generate HLSL code with a lot of “local??” variables, may be possible to be written in a way, where they would be collapsed into much less "local"s and perform faster. This is something to look into a case-by-case.

There could be more, but these come in my mind for now and will hopefully help those who were looking for solutions with branching.

UE5 seems to finally add this (experimentally):

https://github.com/EpicGames/UnrealEngine/commit/a557c4be0e8b5aee075a5cf4bf0ba49bcea3438f

2 Likes

Does it work? Anyone knows what the cvar is and the flag you have to set?

"- Very WIP, not intended/ready for production use

  • Hidden behind CVAR to enable support, and per-material flag to opt in"

Unreal Engine 5.3 seems to remove the dynamic flow control feature:

https://github.com/EpicGames/UnrealEngine/commit/9e02435ecf56af9338e00801cd6b36f658b543b7

That’s very unfortunate.

Sad that they have removed it.

But something interesting I have noticed in 5.3 on bools there is the option to enable dynamic branching? But perhaps this is not actual dynamic and still a flatten branch. (like a lerp)

One of the better explanation for branching in shaders is actually in the unity docs. They explain the differences and drawbacks between static, dynamic and flatt quite well:

1 Like

Interesting, dynamic branching support on the static switch node… So it’s a dynamic branch that cannot be changed at runtime? That seems quite weird :thinking:

It doesn’t seem to be documented anywhere?

1 Like

Dynamic flow-control off bools/switches is great.

What I want, if possible, is dynamic flow control where the alpha of a LERP/IF/etc can be evaluated before either branch, and then if 1 or 0, execute JUST that branch.

I have stone, then from a slope-mask, or whatever, I LERP to dirt, then to grass, then maybe water or snow; a progression. Except for the transition-areas, the blends, the alpha will be all of one or all of the other. Dynamic branching would (hopefully) ensure that only 1 out of X ‘layers’ is executed. No?

Yes no documentation at all. I don’t know where it was written but it’s intended use is to reduce shader permutations. It will compile the material to support all the features and a dynamic branch is then used to toggle which are enabled but atm it’s only possible in a static way by creating a new instance in the editor. (You can not make a dynamic instance and toggle the bool) Even though it should be possible.
I tried to plug in float parameter and stuff but it didn’t work. It might just be that they still need to create a bool parameter which works at runtime.

If you want to switch a feature on and off but don’t want to trigger compilation or want to reduce shader permutation its good. But it will have a higher cost in memory (because all features are supported in the shader) and probably a tiny cost for the dynamic branching when executing the shader. I would be very interested how granular you should be with it how many branches you should add. (One dynamic branch for all the special features or for example 20 branches for every single one)

I hope they add that it can be toggled at runtime. Then it can be used for: toggling on hit effects, special effects on materials, weather effects which are not used all the time.
I would love if they made it possible to be used in a parameter collection. So that I can toggle global features. (Like fog, weather effects)

I am no expert but in a scenario like that it can be tricky because its best when the gpu does the same work over the entire draw call. (So if you branch different pixels out you might actually lose performance if you are not careful)

You might think about if you really need dynamic branching. The performance problem is likely from the cost of sampling all the textures of all the layers. But you can look into texture arrays. Then you can sample multiple textures as if they would be one. Its like you would make one big texture out of all the textures and then shift where it samples depending on the layer.
Of course you would not get any blending by default but you can try to sample the 2 layers with the greatest weight.

For Landscape materials there is also a “Landscape Layer Switch Node” it will compile a static material depending on which layers are used for the landscape tile. For example if you have a “golden flower layer” which is only on a small part of the map it would help a lot. But if it’s used everywhere a little it makes no difference.

I wouldn’t try and bake dynamic-branching in if I could help it. I know I’d likely end up making a costly mistake.

My comment is more to the point that when I do my regular-maths, LERP things and the like, there will be points where the alpha of a thing is wholly 1 or 0. IF it’s technically possible, being able to ‘foresee’ that and only run the one/other input, to my mind that could be a savings, but I’m not a dev in this area. No idea if it’s at all logically/mechanically possible. The idea here is that the engine might find a way to save overhead at runtime, even if I cannot predict such up-front.

Think a slope-mask. At some point it’s all one thing or all another. In between, sure you run both paths b/c you need to mix between the two. Otherwise, for the bulk of the work it’s either/or. IF the engine could create a savings there, that would be nifty.

As for the node, I am guessing it’s simply ‘you can have both sides of this boolean-switch and have access to the on/off paths at runtime w/o needing to recompile’, like a best-of-both-worlds thing. Who knows, maybe it’s keeping track of two distinct shaders and swapping them out at runtime?