I don’t even know why I waste my time anymore. but fine, here’s some more in-depth comparisons
the basic setup: an empty scene with a sphere (movable), a directional light (movable) and a skylight (stationary). the camera is fixed in all cases.
here we have a simple material, as basic as it gets. this is the best case scenario in terms of performance (~8ms)
next up is a 2k texture from shootergame, sampled in a loop of 512 iterations with the UVs slightly offset at each iteration. in these tests this is the theoretical worst case scenario in terms of performance (~20ms)
now I start with your alleged version of branching.
I have the 512 texture iterations loop hooked, and the evaluated condition is a gradient with a bias factor.
despite all pixels are visually showing ThroughA, the performance is worse than the theoretical worst case scenario because not only it’s processing the 512 texture sampler iterations per pixel, but also the branch itself is adding to the cost (~21ms)
still with your version of branching, I bias the condition gradient so that only a small area at the top is visually showing the 512 texture samplers. performance is still as bad (~21ms)
still with your version of branching, I bias the condition gradient so that half the sphere visually shows the 512 texture samplers. performance is still as bad (~20ms)
still with your version of branching, I bias the condition gradient so that the 512 texture samplers visually show everywhere except a very small area at the bottom (though this one isn’t even shown in the main viewport).performance is still as bad, and [minus small fluctuations] exactly as bad as the first case (~21ms)
now let’s move to real branching.
I moved the 512 texture sampler iterations to be nested inside the branch, but everything else is exactly the same
with real branching, when the 512 sampler iterations are skipped the performance is as good as the best case scenario, i.e. we’re really sure the 512 texture samplers are skipped in all pixels (~8ms)
still with real branching, biasing the condition to show a little bit of the complex part starts making things slightly slow (~8ms)
still with real branching, biasing the condition to be halfway (but significantly more pixels of the textured area) shows the performance is correlative to the amount of pixels that output the 512 texture samplers (~18ms)
still with real branching, biasing the condition all the way to only a small area at the bottom (not even shown in the main viewport) matches the worst case scenario and once again shows that performance is correlative to the amount of pixels that output the 512 texture samplers (~21ms)
and that is what I meant with a real scenario with proper testing methodology
what you think you know about how GPUs work and how you think branches should work is irrelevant. it’s been proved that doing complex operations outside a dynamic branch and then putting a dynamic condition to evaluate them is just as useful for performance as adding a lerp.
PS. the results you’ve been getting are due to you using constants on the evaluated condition. it seems to have some validity but only under a very specific scenario (having the entire material processing the same branch from a condition that affects all pixels equally, in which case it seems to behave as a static branch)
the moment you put a condition that’s actually dynamic as you’d expect from dynamic shader branching (i.e. using a mask, the vertex normals, etc) everything gets evaluated and you end up with the branching effect completely lost