Here is one thing I did find out about branching, which should be taken in consideration.
Now, I’m no expert, but I found this info from sources which can be trusted.
This may explain, why sometimes branching may work, sometimes not and sometimes it may cause performance to get even worse compared to a non-branched material as in some benchmark results we saw previously in the thread.
The GPU tries to do it’s work in masses. That means, unlike a CPU, which runs code generally sequentially, and can run in parallel some threads, a GPU runs rather thousands of operations in parallel, that’s why they have thousands of shader cores. These shader cores are grouped. The smaller the amount of cores in a group, the better it can perform. Depending on the hardware, some gpu’s may have each 32 or 64 shaders grouped, others however even up to 1024.
That’s why a gpu renders usually many thousands pixels in parallel.
Now the thing is, a shader can be only optimized based on all pixels which are rendered from a shader group. Let’s call this a batch.
(For clarification: The following explanation accounts also for the if in the compiled asm of the shader!)
That means, to gain a significant performance on the material/shader from branching, the calculations of all pixels in a batch must follow the same branch. If only one of those pixels will have to follow a different branch, all the branches must be processed for the whole batch. A good example of this happening is one of the benchmarks in this thread, where a branched material would be worse than the material with no branching. Because, if the branches are repeating mostly calculations, which may be shared previously with no branching, they could now be up to double as heavy.
Furthermore this means, that any pixels taking the “early” branch route, still have to wait for the “long” route to be processed too, when a batch is not using a single branch. And if the gpu has no other threads to take in the meanwhile, because it’s “waiting”, branching will lead to wasted cycles and worse framerates. That’s why a non-branched material may be effectively more efficient too, depending on the case.
That’s why gpu branching won’t work in an always-optimized-manner in the way we know it from CPU code.
It is easy to see, how a material with a lot branches, may lead to a situation which could destroy framerate.
Knowing that, this makes it obvious, that branching seems to require a lot of attention, when to use, when not, which operations should go into each branch, if the branches should be rather short and so on. If the branch condition may flip often or not be able to group many nearby pixels, the performance impact could become very unpredictable. A branched material, which may work in one case to increase performance, used in a different situation may create a performance issue in another. The differences on shader grouping on different platforms add even more to this “unknown factor” - while a 16-shader group gpu may be able to optimize a 3-branched material a lot, a 512-shader group gpu may kill framerates completely.
Based on the gpu-specific nature of branching, I can see, why the unreal engine engineers would not easily opt for adding a feature as node and leaving it only as an option inside custom expressions, seeing how it could lead to unpredictable (performance) issues in materials and extremely different results on different platforms and hard-to-debug cases. And that’s probably why the If node doesn’t utilize branching.
After all, predictability/consistency is a very important factor when creating games, especially when planning ahead with different performance budgets for different platforms.
That said, if someone wants to benchmark or test if branching is even working (like in a thread like this one), would have to test and compare results not only on their own platform, but do the same tests on all involved ones and rather with real test cases - as the results may not replicate at all on another gpu or may behave with fictive test cases very differently than with real cases even on the same gpu.