Dynamic flow control in materials

Deathrey · April 11, 2017, 4:28pm

Cool idea about implementing stop branch markers. I also had something similar implemented, but in form of comment-like boxes. But eventually I ditched it in favor of coding the thing directly. I failed to come up with good unified solution to handle cases where branches are interlinked somewhere in between. Maybe you will have a better luck at this. Great job in any case.

Just wanted to add that I ended up using texture arrays for core layers of landscape shader and reserved flow control for triplannar layers. Frame to frame render time consistency ended up being a major factor.

Chosker · May 9, 2017, 6:33am

I’m surprised there isn’t more interest in this lately
still curious if Lucyberad’s effort couldn’t be taken by Epic as a base for the feature

DamirH · May 9, 2017, 7:31am

The thing is, as Lucyberad found out himself, it’s extremely difficult to pull everything connected to a branch into the HLSL properly. Would probably require major reworking of the material compiler. This is kind of a feature that people who know about it and need it are versed enough to just slap together a custom node. I’d much rather see some work put into the custom node editor, but that’s just me.

Chosker · May 9, 2017, 11:46am

the problem with the custom node is that it makes the materials much harder to maintain and debug. I made a landscape material with the custom node myself to try the branching. having one big monolithic block with lots of code (basically duplicated per layer, with no access to functions) was a big pain.
as the branching (currently) requires everything to be put inside the [branch] if (otherwise it gets executed for both cases of the branch), no amount of custom node improvements would fix this because as Lucyberad stated this goes against the way the material compiler produces shader code

Deathrey · May 9, 2017, 5:02pm

I’d rather agree with this, but I’d expand into something larger than better custom node.
Shader editor is an amazing tool and a signature feature of UE4, but at the same time it is its weakness.
Dynamic branching is just one of several things, that suffer from limitations of material editor.

Tessellation, for example. I’ve spent some time a while ago optimizing it, and the first thing I came across was the fact that whatever is plugged into displacement, is always evaluated in domain shader, while on practice, if you are doing some math there(blending landscape textures, for example), you can move a good deal of it down into control point or even patch constant phase.

Went though that myself too, but you can add functions to material template as a custom include file. Definitely is harder to maintain and iterate on. Not really harder to debug at all. Not sure what forced you to duplicate the code per layer though.
In my first test of the same I had to split stuff into two custom nodes, because Normals+AO+BaseColor+Roughness did not fit into one node.

Allowing you to output matrix from a custom note would pretty much deal with requirement to duplicate any part of the code as you can basically fit whole terrain pixel shader into one custom node.

IronicParadox · August 1, 2017, 1:13pm

Someone linked me to this thread and it still boggles my mind that this topic hasn’t received more attention than this… I recently found out that the IF node doesn’t branch dynamically, as one would think, and that when lerp nodes are at 0/1, they don’t dynamically branch either. Needless to say, I wasn’t too happy about it all because I had been designing around that false assumption. I spent some time playing around with custom nodes and managed to get dynamic branching working, as others have also managed to do here.

Here is a basic custom IF node that will dynamically branch. You’ll need four inputs: A, B, ThroughA and ThroughB. You can edit the code for different names. I ran a bunch of tests between it vs the regular IF node and it works. Behind one branch, I simulated a complex material by performing a bunch of lerping(30 or so random lerps between them all), between a handful of colors, and from there, I ran the final output through 9000+ instructions(just a bunch of world->local->world transforms in series). Behind the other branch was just a single color. The experiment yielded proof of the branching working correctly.

One caveat that I’ve found is that if you have any sort of animating going on behind the “deactivated branch,” like a panner or timer, it will keep the textures “hot” in the cycles and they will still contribute to frame time. Keep in mind that I didn’t test this out extensively, but it seemed to be the case. Due to how parallelized GPU instructions are, it would kind of make sense that it would happen. The shader engine would probably have to be branched on to make things like that shut off, while the branch is deactivated. Though I did only put one of these custom functions in and it was right before the material attributes. It’s a pretty cheap little function, so it probably wouldn’t hurt too badly to throw more of them into the mix; like before things like panners if needed.


[branch] if ( A >= B)
{
return ThroughA;
}
else
{
return ThroughB;
}

This features NEEDS to automatically be in the engine. If not, at least give us two new nodes like “IF - Dynamic” and “Lerp - Dynamic” or something along those lines; so that we don’t have to deal with the hassle of custom nodes and debugging. On their tooltip, you could say something like “Only use this node if it will save you 50 or more instructions.” For small tasks, yeah, branching isn’t ideal and might even hurt performance a little; due to it’s base instruction count. However, for really complex materials such as highly layered landscape materials or materials with heavy effects that you might want to fade to a simpler function in the distance, dynamic branching might double your performance.

Deathrey · August 1, 2017, 2:04pm

You might want to reconsider your testing methodology, and approach it properly in the future, preferably coupled with avoiding posting misleading results.

The following custom node:

Will not branch.

cyaoeu · August 1, 2017, 3:34pm

In that thread I asked you whether a single color branch should take 60ms to render, and I was kind of assuming you would notice something was up, but I guess not. My point was that a single color branch would be way faster if it was actually dynamic branching like intended. An easier way to test this is to first test your custom node (in your case 90ms for the heavy material and 60ms for the single color), then replace the material with another material that just has a single color (representing the simple branch). If that ends up being 60ms too sure, it works, but I really doubt that. It should be way faster. I don’t know your machine specs but if it’s that slow, why test using those settings anyway?

Anyway, on topic, I would really like dynamic branching too, I use layer blending and also lerp “switches” instead of static switches (which honestly is a bad idea if you care about performance). It would be cool if these could be faster. Especially since the layer blending UI is getting an update later according to the roadmap.

IronicParadox · August 1, 2017, 7:19pm

I changed the material to only use colors, instead of mixing in textures (was taking an hour to compile a 9k instruction shader with four 8k textures in it). Like I said, my laptop’s video card is a toaster. With a blank scene, epic settings and 1080p, I get like 10fps… It’s due to the card not having enough shader cores and them being slow. They can’t handle all the high level post-processing and antialiasing; at that high of a resolution.

Anyways, here’s a video showing detailed proof of it working. I show the custom node version of the material and I show the IF node version, of the same material. The custom node causes a consistent spike, while the complex branch is executed, and then dips back down for the cheap branch. The IF node remains the same because it flattens and executes both branches; regardless of the conditional.

Chosker · August 1, 2017, 7:50pm

as stated (but you just chose to ignore) your code isn’t really properly branching.
some of your code is probably being nested into the branch (which would explain the performance difference). however check your generated HLSL code, you’ll see something like this:


local 100 = someStuff1;
local 101 = someStuff2;
local 102 = [branch] if ( A >= B)
{
return someStuff1;
}
else
{
return someStuff1;
}
finalcolor = local102;

once you see this you’ll understand that your node isn’t really nesting things inside your branch, as it’s still declaring and processing everything outside of the branch and then simply branching the final decision of what to use. and this is exactly what is written in the original post
Plenty of things have been discussed in this thread, I’d ask you to take some time to read through it to properly understand how this whole thing is behaving in UE4. otherwise we end up with a very redundant discussion and/or you will keep working under false assumptions

also try using a more real-like scenario than simply using colors and math. you’ll know your branch works when you have a texture hooked and the material fails to compile complaining that it cannot have divergent gradient operations inside flow control (which was also mentioned in this thread). you’ll be entering the topic of DDX/DDY and block pixel processing on shader units, which should hint you at the tradeoff of parallelism vs flow control (which means branching isn’t necessarily better in all cases). or in other words, that things aren’t as simple as you think and things have been discussed for a reason

IronicParadox · August 1, 2017, 9:20pm

Chosker;744545:

as stated (but you just chose to ignore) your code isn’t really properly branching.
some of your code is probably being nested into the branch (which would explain the performance difference). however check your generated HLSL code, you’ll see something like this:
local 100 = someStuff1;
local 101 = someStuff2;
local 102 = [branch] if ( A >= B)
{
return someStuff1;
}
else
{
return someStuff1;
}
finalcolor = local102;
once you see this you’ll understand that your node isn’t really nesting things inside your branch, as it’s still declaring and processing everything outside of the branch and then simply branching the final decision of what to use. and this is exactly what is written in the original post
Plenty of things have been discussed in this thread, I’d ask you to take some time to read through it to properly understand how this whole thing is behaving in UE4. otherwise we end up with a very redundant discussion and/or you will keep working under false assumptions

also try using a more real-like scenario than simply using colors and math. you’ll know your branch works when you have a texture hooked and the material fails to compile complaining that it cannot have divergent gradient operations inside flow control (which was also mentioned in this thread). you’ll be entering the topic of DDX/DDY and block pixel processing on shader units, which should hint you at the tradeoff of parallelism vs flow control (which means branching isn’t necessarily better in all cases). or in other words, that things aren’t as simple as you think and things have been discussed for a reason

So now we are breaking down semantic definitions of things? Lol, alright then… Simply put, it’s working well enough to the point that it’s making a consistent rendering difference in frame time. I have tested it with real materials and it works with them as well. I tried it out with a distance falloff function, where it transitions to a cheaper dithered blending material and it not only works, but also impacts my framerate in a positive manner. As I said earlier, certain things, behind a branch, seem to keep them hot and running. Things like panners and time nodes. I haven’t tested them out fully, but if worst comes to worst, I’d just write it out in code.

What you guys are talking about are more along the lines of a hard switch, that can be changed in run time; unlike the switch node that’s currently in the engine and can’t be changed in runtime(at least not in bps?). You’d need to have all of the other combinations precomputed AND currently loaded, in order for it to switch between the different “paths” of the material. Some pixels on the screen might need path A and some might need path B, so both would have to be loaded. That means EVERY shader core needs to have that code ready to go. So unless every permutation is completely precomputed, you’d be looking at recompiling shaders live. Not only that, but your ram usage, per shader, would go up exponentially. Let’s say you had two main branches, now you’ve likely doubled your ram usage. Let’s say on those two paths, each of them has two of their own paths, you’d now need ram for EACH potential combination of paths and in that case, it would start going up exponentially.

GPUs are sort of “dumb” and can’t/don’t handle a lot of common tasks the same as what you’d expect from a CPU. For the most part, they blindly execute their tasks at hand. Therefore, they usually need their entire “plan” spelled out or all of their potential “plans” immediately available, which would mean having them loaded ahead of time, and ready for them to execute. Do some actual computer science style research on how GPUs work and you’ll get a better grasp of what their limitations are. A good example: have you ever noticed the limitations of cascade GPU particles? Sure, they can pump out MILLIONS of particles, no problem, but you lose the ability to control a lot of things that happen with them and/or influence them.

Even when a shader is branching, I’d still expect for it to show both branches being loaded. Why? Because it’s still a potential path that the shader can go down. The cores don’t really get the chance to ask a question backwards and be like “HELP! WHAT DO I DO GUYS?” Now the instructions behind the loaded branches, well that’s for the core to decide whether or not to execute; based on which branch it chooses and whether or not it’s set to flatten or branch. The IF node definitely flattens (executes both branches and then decides), but using a custom branch, it appears to be branching and not flattening; at runtime.

Chosker · August 1, 2017, 9:56pm

no, we’re still talking about a branch per pixel. no one is discussing semantics, and no one ever mentioned something like a dynamic switch node. your explanation of what you think we’re discussing is completely off of what’s actually been discussed here

you might be getting some gains out of somewhere but as I don’t know exactly what your [real] material is like, I can’t know where your perceived improvement comes from. we don’t even know what would be the base best case scenario to compare with (as cayoeu suggests, but you also ignored)
but again, check your generated HLSL code. I don’t know why you resist so much against it, it’s one button to toggle HLSL, copy-paste into notepad, Ctrl+F for [branch] and you’ll have your answer

IronicParadox · August 2, 2017, 1:40pm

You mean like this? Yeah, I’m showing it in the editor because you could just say that I put it into the wordpad copy lol…

Test material to make it easier to find:

Chosker · August 2, 2017, 2:12pm

yes I mean like that, but with a proper setup (at least a texture inside each branch), and scrolling to the part in HLSL where the CustomExpression0 function is used in your case

IronicParadox · August 2, 2017, 2:58pm

I plugged in two textures and it gives me:


MaterialFloat4 Local0 = ProcessMaterialColorTextureLookup(Texture2DSample(Material.Texture2D_0,Material.Texture2D_0Sampler,Parameters.TexCoords[0].xy));
MaterialFloat4 Local1 = ProcessMaterialColorTextureLookup(Texture2DSample(Material.Texture2D_1,Material.Texture2D_1Sampler,Parameters.TexCoords[0].xy));
MaterialFloat3 Local2 = CustomExpression0(Parameters,0.00000000,0.50000000,Local0.rgb,Local1.rgb);
MaterialFloat3 Local3 = (Local2 + Material.VectorExpressions[1].rgb);

Which should be expected, even with dynamic branching. This is because at any moment, the [branch] if (something > something else) could become true and therefore it will need to evaluate the other input.

They key is in when it compiles the IF with either the [branch] or [flatten] methods. I’m assuming that flatten is the default when you don’t specify an attribute before the IF. As I showed earlier, it shows the [branch] in the custom code node, so unless you can uncompile the shader, to see for sure, I’m going to assume that it acknowledged the attribute parameter.

Also, I remade 's node graph, from the original post and this is the HLSL code that I get when I use the branch node of mine, instead of the IF node that he showed:


    MaterialFloat Local0 = min(max(Parameters.TangentToWorld[2].b,0.00000000),1.00000000);
    MaterialFloat3 Local1 = (GetWorldPosition(Parameters) / 512.00000000);
    MaterialFloat2 Local2 = DDY(Local1.rg);
    MaterialFloat2 Local3 = DDX(Local1.rg);
    MaterialFloat4 Local4 = ProcessMaterialColorTextureLookup(Texture2DSampleGrad(Material.Texture2D_0,Material.Texture2D_0Sampler,Local1.rg,Local3,Local2));
    MaterialFloat3 Local5 = (GetWorldPosition(Parameters) / 2048.00000000);
    MaterialFloat2 Local6 = DDY(Local5.rg);
    MaterialFloat2 Local7 = DDX(Local5.rg);
    MaterialFloat4 Local8 = ProcessMaterialColorTextureLookup(Texture2DSampleGrad(Material.Texture2D_0,Material.Texture2D_0Sampler,Local5.rg,Local7,Local6));
    MaterialFloat3 Local9 = CustomExpression0(Parameters,Local0,0.50000000,Local4.rgb,Local8.rgb);
    MaterialFloat3 Local10 = (Local9 + Material.VectorExpressions[1].rgb);

Here is the replica of his node graph:

Chosker · August 2, 2017, 3:08pm

IronicParadox;744905:

I plugged in two textures and it gives me:
MaterialFloat4 Local0 = ProcessMaterialColorTextureLookup(Texture2DSample(Material.Texture2D_0,Material.Texture2D_0Sampler,Parameters.TexCoords[0].xy));
MaterialFloat4 Local1 = ProcessMaterialColorTextureLookup(Texture2DSample(Material.Texture2D_1,Material.Texture2D_1Sampler,Parameters.TexCoords[0].xy));
MaterialFloat3 Local2 = CustomExpression0(Parameters,0.00000000,0.50000000,Local0.rgb,Local1.rgb);
MaterialFloat3 Local3 = (Local2 + Material.VectorExpressions[1].rgb);
Which should be expected, even with dynamic branching. This is because at any moment, the [branch] if (something > something else) could become true and therefore it will need to evaluate the other input.

no and no. it’s not expected, even with dynamic branching. the expected behavior is that if the branch becomes true, it will process the stuff and evaluate it. as such, the texture samplers only need to exist nested within the branches
the code right there is crystal clear. it’s declaring and using your textures and sampling them per pixel, and then making the evaluation. unreal will not execute any further magic beyond what the HLSL code reports.
it’s all explained thoroughly in this thread and I’ve continued to explain it to you, but as it’s clear you don’t want to believe anything except what you tell yourself I’ll just stop bothering

IronicParadox · August 2, 2017, 3:33pm

Again, I think you should probably do some research on how GPUs work and I think you’re probably misunderstanding the difference between a high level language and how it gets compiled into a low level language. The difference lies within the compiler and how it interprets attributes like [branch]… You won’t know unless you have access to the source code of that compiler.

Also, yet again, regardless of what state the branch is in, it will need to have BOTH branches ready to go at any moment. That means it will need all of the “nodes” and the textures, ready to go. The tradeoff lies in which set of instructions to execute; which is where the performance savings come in. Otherwise, you’d need to do what I was saying earlier and have the compiler make a separate shader for EACH potential combination of the material “tree” and load ALL of them into the ram, so that they are ready to be switched to; on demand. Which again, is exactly why the switch node is NOT changeable at runtime; under normal circumstances.

Here is it branching with textures… Exact same results… And yes, yet again, my laptop is a toaster and will have a high frame time, even with an empty scene, at 1080p epic settings.

Chosker · August 2, 2017, 7:07pm

I don’t even know why I waste my time anymore. but fine, here’s some more in-depth comparisons

the basic setup: an empty scene with a sphere (movable), a directional light (movable) and a skylight (stationary). the camera is fixed in all cases.

here we have a simple material, as basic as it gets. this is the best case scenario in terms of performance (~8ms)

next up is a 2k texture from shootergame, sampled in a loop of 512 iterations with the UVs slightly offset at each iteration. in these tests this is the theoretical worst case scenario in terms of performance (~20ms)

now I start with your alleged version of branching.
I have the 512 texture iterations loop hooked, and the evaluated condition is a gradient with a bias factor.

despite all pixels are visually showing ThroughA, the performance is worse than the theoretical worst case scenario because not only it’s processing the 512 texture sampler iterations per pixel, but also the branch itself is adding to the cost (~21ms)

still with your version of branching, I bias the condition gradient so that only a small area at the top is visually showing the 512 texture samplers. performance is still as bad (~21ms)

still with your version of branching, I bias the condition gradient so that half the sphere visually shows the 512 texture samplers. performance is still as bad (~20ms)

still with your version of branching, I bias the condition gradient so that the 512 texture samplers visually show everywhere except a very small area at the bottom (though this one isn’t even shown in the main viewport).performance is still as bad, and [minus small fluctuations] exactly as bad as the first case (~21ms)

now let’s move to real branching.
I moved the 512 texture sampler iterations to be nested inside the branch, but everything else is exactly the same

with real branching, when the 512 sampler iterations are skipped the performance is as good as the best case scenario, i.e. we’re really sure the 512 texture samplers are skipped in all pixels (~8ms)

still with real branching, biasing the condition to show a little bit of the complex part starts making things slightly slow (~8ms)

still with real branching, biasing the condition to be halfway (but significantly more pixels of the textured area) shows the performance is correlative to the amount of pixels that output the 512 texture samplers (~18ms)

still with real branching, biasing the condition all the way to only a small area at the bottom (not even shown in the main viewport) matches the worst case scenario and once again shows that performance is correlative to the amount of pixels that output the 512 texture samplers (~21ms)

and that is what I meant with a real scenario with proper testing methodology
what you think you know about how GPUs work and how you think branches should work is irrelevant. it’s been proved that doing complex operations outside a dynamic branch and then putting a dynamic condition to evaluate them is just as useful for performance as adding a lerp.

PS. the results you’ve been getting are due to you using constants on the evaluated condition. it seems to have some validity but only under a very specific scenario (having the entire material processing the same branch from a condition that affects all pixels equally, in which case it seems to behave as a static branch)
the moment you put a condition that’s actually dynamic as you’d expect from dynamic shader branching (i.e. using a mask, the vertex normals, etc) everything gets evaluated and you end up with the branching effect completely lost

Deathrey · August 2, 2017, 7:08pm

Nopes. Does not work that way bro.

@Chosker Thanks for a valuable and comprehensive post. It will surely clear up misconceptions introduced into this thread recently.

Pretty good point. It seems to be commonly overlooked, that if you try to do the same without offsetting the UVs, it is decimated down to one lookup.

IronicParadox · August 3, 2017, 5:44am

Chosker: Good examples, but for someone who keeps preaching to read the HLSL code, you definitely didn’t in your flawed mockup of “my” version. Your two custom nodes are being executed in the wrong order. Custom nodes don’t follow a lot of the same rules that standard nodes do. They can interact with each other and be executed out of order. Which is why when you condensed the custom nodes, into one node, it functions as you’d expect; using pretty much identical logic.

If you check the HLSL code, the proper order would show CustomExpression0 as the IF branch and CustomExpression1 as the texture function. Your version of my branch places them in the wrong order(I tested it). It’s executing the texture function FIRST, THEN it’s executing the IF branch; therefore that’s where the frame cost is coming from because it’s doing the complex loop before it ever even reaches the branch IF to see which branch to execute. This isn’t an issue with dynamic branching, this is an issue with how the editor/compiler orders the custom functions. It’s in the engine, it’s just fickle right now. Ideally, there would be an option to set the CustomExpression orders kind of like with material function inputs.

even touched on this ordering topic in a live stream at one point(watch around 12:30):

And you’re wrong, I’ve used dynamic effects in them with things like camera/absolute/pixel and it still works just fine. The version I showed was for simplicity sake. I have it integrated into some game assets already and it’s working as intended. Obviously, the code isn’t as simple as the branch I was demonstrating, but it still gets the point across that the engine DOES have dynamic branching; if you use it correctly and tiptoe around it’s quirks. Personally, I have avoided using custom nodes, behind other custom nodes, and just try to collapse it into one; if needed(rarely). I try to stick to mostly regular nodes, that way the branching works without hassle.

The biggest point still rests that if you want it, it’s there and it works. Hopefully they expand on HLSL some and make it a little less finicky. You’ll just have to play around with it and make sure that the orders of the expressions are correct… If you’re trying to dynamically branch, try to avoid using any more custom nodes behind it; as they will likely give you trouble and execute out of order. If you absolutely have to, then condense the custom nodes into one; to avoid the issues that Chosker presented.

If the devs ever read this:
On the custom nodes, please give us the ability to manually override what order the custom expressions are evaluated. Right now, it seems to order them from “left to right” but it would be awesome if we could change an option and make it go from “right to left.”
Actually, scratch that. After thinking about it for a minute, that would pretty much require a rewrite of the entire shader compiling engine. I’ll just stick to using dynamic branching for saving instruction counts when they aren’t needed.

Also, even though it’s pretty quick and easy to make a custom dynamic branching IF or lerp node, it would be really nice if we had them as regular nodes like If Dynamic and Lerp Dynamic