How does Branch Prediction work with if nodes in Materials?

Until recently I’ve been a pretty big fan of the ‘if’ node, but have recently discovered / been enlightened by something called ‘branch prediction’ that might potentially make the node a lot more expensive than I thought it was.

For those who don’t know, Branch Prediction (at it’s most basic) basically means that the CPU (and possibly, GPU too) will intelligently predict the outcome of an if-else statement, and is actually an almost necessary part of modern day computing since without it, you’d be forever waiting on tasks to complete and wasting huge chunks of your available resource time.

However I have been advised that using If nodes in shaders can potentially be a bad thing, since there’s often not really any way a processing unit can ‘predict’ the outcome of an IF node in shader languages, and you end up doing much more computational work. I’m just interested to know if this is actually the case and whether Shaders do any form of Branch Prediction at all? If so, how much performance am I realistically likely to gain from refactoring shaders to use different networks as opposed to If nodes? There are times where that isn’t possible of course and I just have to suck it up… but it’s an interesting topic nonetheless!

Yeah I’ve been reading up on it too. I think branch prediction will always make shaders slower because not all cores will finish at the same time. There are a few tricks you can do to get the same results. They involve doing some mults, adds/subtracts to simulate a 1.0 or 0.0 result. I haven’t tried these but would be interesting to do some benchmarks and wrap these up into material functions. Could even push them into master? Here’s a page outlining them.

http://theorangeduck.com/page/avoiding-shader-conditionals

I had a look the HLSL code generated but couldn’t find where the if conditional was.

That’s quite interesting… I actually very rarely tend to use them for mathematical statements, but they can be really useful for swapping out textures and stuff. I like to use them to select different channels of textures for example via Parameters. Since it’s a Parameter though, I guess the material thinks that it could change at any given moment and therefore has to be predicted…

I can’t imagine the instruction counts or even Shader Complexity really reveal any problems here too. Would likely have to venture much deeper into profiling to find any performance problems.

You could probably use these things to still conditionally swap textures. You would have to sample both textures though so that would be an increase cost in terms of texture reads. These fake conditionals would then effectively be lerps based on some input parameter (which could be runtime controlled). But yeah profiling is probably needed. Hopefully though all cores run at the same speed and it becomes what is known as ‘embarrassingly parallel’. I’m gonna try and use these in some fluid simulation code of mine.

some more reading…

http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter34.html

Modern gpu are quite good with branches. Problem is divergent branches. Branches also increase register pressure which might lower the capacity of hide latency. If you replace branch with static switch then do it. If not then try to skip a lot of math or at least one texture sample. If that is not the case then just lerp all the things.