Dynamic flow control in materials

Chosker · August 3, 2017, 7:52am

fine I wasn’t aware of the custom node execution order logic

in this new case I’ve made 256 texture samplers, offsetting the UVs and averaging them just like my previous test. except I use 128 samplers of texture A and 128 samplers of texture B (both 2k textures from shootergame), and the UV offsetting value is even bigger (so the color is just a blur). all of this using purely nodes.
also this time I remembered to close every other editor window as they take up significant performance (which explains the difference even between the simple material across the 2 tests)

our good friend the simple material

256 sampled textures

and this is a portion of how the material looks

let’s go to your branch, just the way you’ve explained the usage

I hook the 256 samples into your branch

biasing the gradient so no pixels visually appear with the 256 samples. performance is bad. it’s not only not comparable to the simple material, it’s even worse to just sampling the 256 textures without any of your branching

biasing the gradient to show half the sphere with the 256 samples. things get worse

biasing the gradient to show the full sphere with the 256 samples. things get even worse.

this is the HLSL code:
the function (which matches yours perfectly)


// Uniform material expressions.
MaterialFloat3 CustomExpression0(FMaterialPixelParameters Parameters,MaterialFloat A,MaterialFloat B,MaterialFloat ThroughA,MaterialFloat3 ThroughB)
{
[branch] if ( A >= B)
{
   return ThroughA;
}
else
{
   return ThroughB;
}
}

and the actual material using the function:


[a crapton more sampler operations go above here]
    MaterialFloat4 Local1267 = ProcessMaterialColorTextureLookup(Texture2DSample(Material.Texture2D_1,Material.Texture2D_1Sampler,Local1266));
    MaterialFloat3 Local1268 = (Local1267.rgb / 126.50000000);
    MaterialFloat2 Local1269 = (Parameters.TexCoords[0].xy * 1.00000000);
    MaterialFloat2 Local1270 = (Local1269 + (MaterialFloat2(0.23450001,0.23450001) * 63.00000000));
    MaterialFloat4 Local1271 = ProcessMaterialColorTextureLookup(Texture2DSample(Material.Texture2D_1,Material.Texture2D_1Sampler,Local1270));
    MaterialFloat3 Local1272 = (Local1271.rgb / 126.50000000);
    MaterialFloat3 Local1273 = (Local1268 + Local1272);
    MaterialFloat3 Local1274 = (Local1264 + Local1273);
    MaterialFloat3 Local1275 = (Local1255 + Local1274);
    MaterialFloat3 Local1276 = (Local1236 + Local1275);
    MaterialFloat3 Local1277 = (Local1197 + Local1276);
    MaterialFloat3 Local1278 = (Local1118 + Local1277);
    MaterialFloat3 Local1279 = (Local959 + Local1278);
    MaterialFloat3 Local1280 = (Local640 + Local1279);
**    MaterialFloat3 Local1281 = CustomExpression0(Parameters,Local1.g,0.50000000,0.00000000,Local1280);**

    PixelMaterialInputs.EmissiveColor = Material.VectorExpressions[2].rgb;
    PixelMaterialInputs.Opacity = 1.00000000;
    PixelMaterialInputs.OpacityMask = 1.00000000;
    PixelMaterialInputs.BaseColor = Local1281;
    PixelMaterialInputs.Metallic = 0.00000000;
    PixelMaterialInputs.Specular = 0.50000000;
    PixelMaterialInputs.Roughness = 0.50000000;
    PixelMaterialInputs.AmbientOcclusion = 1.00000000;
    PixelMaterialInputs.Refraction = 0;
    PixelMaterialInputs.PixelDepthOffset = 0.00000000;

as you can see, all the complex operations are executed before the branch and then the branch evaluates and returns (exactly the way you use and show it yourself)

back into my version of the branch, I remade the same behavior all inside the custom node nested into the branch
performance isn’t as good as the branch-less version but as explained many times, the branch has a cost

and everything branched out:

first, it’s indeed relevant to avoid the extra custom node going into your version of the branch. with your branching at least now the performance is correlative to the amount of pixels shown, so it shows that indeed some sort of branching is going on
however your method is showing to be counter-productive beyond measure. “branching out” all the expensive parts is even more expensive than just processing the expensive parts in a branch-less version, while “branching in” the expensive parts again makes things even worse.

now,
for the sake of clearing things up a bit more I’ve re-created your version of the branch in my custom node

I do the big loop outside of the branch and just return the result based on the branch evaluation.
this in theory should be the equivalent of what you’re doing, with the difference that my loop is actually inside the CustomExpression function while your examples have everything outside the function except the branch itself (not sure how relevant it is, but that’s that)

all pixels “branched out”

all pixels “branched in”

and the generated HLSL code.
the function:


MaterialFloat3 CustomExpression0(FMaterialPixelParameters Parameters,Texture2D TexObj, SamplerState TexObjSampler ,MaterialFloat2 TexUVs,MaterialFloat2 TexDDX,MaterialFloat2 TexDDY,MaterialFloat A,MaterialFloat B,Texture2D TexObj2, SamplerState TexObj2Sampler )
{
int i;
int maxIt = 128;
float4 result = float4(0,0,0,0);

for (i = 0; i < maxIt; ++i)
{
   result += TexObj.SampleGrad(TexObjSampler,TexUVs + float2(0.2345,0.2345) * i,TexDDX,TexDDY) / maxIt;
   result += TexObj2.SampleGrad(TexObj2Sampler,TexUVs + float2(0.2345,0.2345) * i,TexDDX,TexDDY) / maxIt;
}

[branch] if ( A >= B)
{
    return float4(0,0,0,0);
}
else
{
   return result;
}
}

and the actual material using the function:


    // Now the rest of the inputs
    MaterialFloat2 Local0 = (Parameters.TexCoords[0].xy * 1.00000000);
    MaterialFloat2 Local1 = DDX(Local0);
    MaterialFloat2 Local2 = DDY(Local0);
    MaterialFloat2 Local3 = (Parameters.TexCoords[0].xy * 1.00000000);
    MaterialFloat2 Local4 = (Local3 + Material.ScalarExpressions[0].x);
    MaterialFloat3 Local5 = CustomExpression0(Parameters,Material.Texture2D_0,Material.Texture2D_0Sampler,Local0,Local1,Local2,Local4.g,0.50000000,Material.Texture2D_1,Material.Texture2D_1Sampler);

    PixelMaterialInputs.EmissiveColor = Material.VectorExpressions[2].rgb;
    PixelMaterialInputs.Opacity = 1.00000000;
    PixelMaterialInputs.OpacityMask = 1.00000000;
    PixelMaterialInputs.BaseColor = Local5;
    PixelMaterialInputs.Metallic = 0.00000000;
    PixelMaterialInputs.Specular = 0.50000000;
    PixelMaterialInputs.Roughness = 0.50000000;
    PixelMaterialInputs.AmbientOcclusion = 1.00000000;
    PixelMaterialInputs.Refraction = 0;
    PixelMaterialInputs.PixelDepthOffset = 0.00000000;

in this scenario it’s clear that putting the stuff outside of the branch is having zero effect, and the complex parts are getting processed regardless of branching it afterwards. this is the scenario has described before as “not really branching”
however things aren’t really much more clear because this seems to be a different scenario than the other 2 above: not a positive impact (like the full nested branching), and not a weird-negative-positive-negative impact (like your branching in my results)

you still haven’t showed a working example with a working comparison. and I mean a proper material that will branch things differently per pixel (all you had was with a scalar parameter which affects all pixels the same).
show a simple material, show something complex with no branching involved, and then show it with your branching. without the 3 cases compared it’s impossible to tell any difference between the different usages. so far all I see is some difference and “my computer is a toaster but trust me, it’s better”

IronicParadox · August 5, 2017, 1:50pm

I found some work arounds and it will now branch things differently per pixel. The issue mostly lies in the “alpha” slot of the branching or lerping. If it had dynamic stuff in it, it would cause the branching to not work. However, I found a way around it and it works fine with samplers like AWP, VNWS, PXWS and so on. I remade the very first post into a working version and I remade your custom iterative node into a working version. It mostly boils down to orders of operations.

Chosker · August 5, 2017, 3:46pm

so after watching your video…



if ( A > B )
{
return float(1);
}
else if ( A < B )
{
return float(0);
}
else
{
return float(1);
}

if, elseif, else? you ever heard of >= and <= ? do you really need to cast 1 to float? can’t even.

you’re still comparing performance with branch A or branch B (and mixed), but you still never compare a material that does the same as the exact visual output of lets say, branch A (but without any of the branching stuff), against you branched material with branch A. i.e. the comparison of branched vs unbranched. therefore it’s impossible to know the cost of your branching (i.e. what I proved in my previous post)

“working model 1” is hardly a working model of anything real-life. you’re again dodging the issue of what branching tries to save up on: expensive operations. making a sum 300 times isn’t a working example of an expensive operation. this is relevant because as I’ve shown, texture sampling behaves differently.
I’ll need to test your working model 1, maybe passing just the order of the CustomExpressions works some magic but I cannot be certain just from looking at your video

“working model 2” obviously works perfectly because you copied my custom node where the expensive operations are nested inside the branch. all you did was add some extra unneeded stuff into the branch condition. you could remove your custom IF and just hook the clamp that comes before it, and it would work all the same.

in short, I just spent 14 minutes on a video where you show that your method (“working model 1”) is still not doing anything close to a real life scenario of what you’d expect dynamic branching to be useful at, and then showed that my method works (“working model 2”),

IronicParadox · August 5, 2017, 5:43pm

Chosker;746199:

so after watching your video…
if ( A > B )
{
return float(1);
}
else if ( A < B )
{
return float(0);
}
else
{
return float(1);
}
if, elseif, else? you ever heard of >= and <= ? do you really need to cast 1 to float? can’t even.

you’re still comparing performance with branch A or branch B (and mixed), but you still never compare a material that does the same as the exact visual output of lets say, branch A (but without any of the branching stuff), against you branched material with branch A. i.e. the comparison of branched vs unbranched. therefore it’s impossible to know the cost of your branching (i.e. what I proved in my previous post)

“working model 1” is hardly a working model of anything real-life. you’re again dodging the issue of what branching tries to save up on: expensive operations. making a sum 300 times isn’t a working example of an expensive operation. this is relevant because as I’ve shown, texture sampling behaves differently.
I’ll need to test your working model 1, maybe passing just the order of the CustomExpressions works some magic but I cannot be certain just from looking at your video

“working model 2” obviously works perfectly because you copied my custom node where the expensive operations are nested inside the branch. all you did was add some extra unneeded stuff into the branch condition. you could remove your custom IF and just hook the clamp that comes before it, and it would work all the same.

in short, I just spent 14 minutes on a video where you show that your method (“working model 1”) is still not doing anything close to a real life scenario of what you’d expect dynamic branching to be useful at, and then showed that my method works (“working model 2”),

Oh I am used to writing them more of that way because of engineering stuff lol… A lot of common tasks have a whole separate function for equals. As for the casting, I leave those there so I can copy/paste the node elsewhere; for other tasks that might be in f2, f3, f4, etc. Technically, no, I don’t have to put it there but it does it behind the scenes anyways. I mean these examples are clearly stated as being dirty technical examples…

There is NO reason to make two branches with the same material setup because that’s COUNTER-INTUITIVE to the point of having branching like this. The point is that you have some effect and there is a cheap version, commonly used for when it’s off in the distance or non-convergent with the camera, and an expensive version, commonly used for the opposite of the cheap version. Hell, most of the time, for the cheap version of the effect, you’d just pipe through the texture with tinting and maybe a noise overlay. It wouldn’t matter if they were visually the same or not. It’s better to break it up into simple things like green for good/simple and red for bad/complex. You already know what the unbranched performance will be like… Both rails will have identical frame rates; regardless of how much of one or the other is rendered.

Working model 1’s 300/1000 loops are there because if they weren’t, the difference between the two rails would be somewhere around 0.05ms. Yes, that’s right, 50 microseconds. Why? It’s not really an expensive effect in the first place and that’s with 8k textures(I upsampled some of the starter textures into 8k textures for testing purposes). I was simply replicating what was in the first post of this thread. The loops where to give a VISUAL difference, that your eyes can discern, between the cheap and expensive versions of the material. It’s there to act as an indicator showing that it is indeed working.

On working model 2, I could have easily just made it do the EXACT same thing as the custom node in, working model 1, and it would still show that the branching works. I only put it in there because you were complaining and trying to claim that it wouldn’t ever work with dynamic inputs and that none of my other examples were “real world” examples.

And no, I actually corrected the second model because you guys said that it didn’t work with dynamic inputs like vertex/pixel samplers. Your “code” is simple and nothing special. You literally copy+paste’d the reference on how to sample texture objects, filled in the necessary parameters(there’s usually a whole list of potential parameters and the order to place them into the sampler) and modified the UVs with an offset on each loop through. Oh and divided by a number… Man, let me see if I can coax my girlfriend into making you some cookies because you’ve really earned them! (I’m a terrible cook, so baking is mostly out of the question)

Anyways, you can plug your ears, close your eyes and lalalalalalala all you want, but both “styles” work well enough and that’s all I’m going to say. If you find a way to break it, there’s probably going to be a way to work around it. You just have to use your head, research, think outside of the box a little, experiment and keep at it until you find a solution. It’s a science, not an art.

Deathrey · August 5, 2017, 9:37pm

Dirty? Yes. Technical? No.

That is a first time I come across a term “vertex sampler”. Why would you sample a vertex? It is just a vertex, leave it alone.

So far, you haven’t demonstrated a working branch, but you are eagerly showing lack of competence, by engaging into argument without anything constructive backing you up. Hilarious thing is that if you in particular were able to at least check ASM result of your code, that argument would not be taking place.

The source is out there, not hidden behind seven seals.

What is this phrase doing here? It does not tag along well with a concept of maintaining decent level of professionalism.

I truly hope so, because everything, you’ve posted in this thread up to this point, is misleading pseudo-scientific rant with a very limited value to the topic discussed.

IronicParadox · August 6, 2017, 3:55am

Yes, it’s technical in the sense that it’s trying to simulate a heavy and a light load.

If you want to know the information of a specific vertex, such as it’s XZY position, you sample it’s information… Therefore you’re sampling a vertex and that would make it a vertex sampler…

Yes, I definitely have shown a type of dynamic branching but you can keep wallowing in your denial all you want. The beauty about facts is that they are indifferent to opinions. And yes, that couldn’t happen under normal circumstances but if you had other things directly editing the memory, that’s a whole different story. Like I said though, it’s more of an engineering habit because if you’re designing a machine, that can possibly be hazardous to human health, you need to “dot all of your Is and Ts,” or else you can be held liable for damages. I didn’t really want to dive into explaining it, so I kind of just skipped over it.

I’m sure it is, but I’ve never bothered to check because I don’t feel like playing follow the rabbit down the rabbit hole of includes; through thousands of lines of code. Have you studied/read through it entirely and do you comprehend it 100%?

She’s actually a professional cook! Though using your logic, we had better grab our pitchforks and torches for the 90% of “professionals” that have appeared on live streams and joke around non-stop(Epic devs, please don’t stop with the jokes and humor on streams). Also, you have little room to talk about professionalism. You try to be as politely insulting as possible and it’s a cringe-worthy attempt at best.

There is technically no such thing as something being pseudo-scientific. That phrase, alone, is a psuedo-phrase. Either it’s scientific or it’s not scientific; just like binary. You either follow a scientific method or you don’t and there is technically no grey area between. Almost sounds like some form of an ironic paradox though… But you can think what you want, you’re just being arrogant and stubborn at this point.

Hopefully I’ll have some time this next week or so and I’ll make a better example. I’ll probably make a layered landscape material that has some branching in it and I’ll make a second copy without it all in there; to show the FPS differences. I have a couple of landscape materials ready to go and working, but I don’t want to share the materials for obvious reasons. I’ll try to remake something similar enough, but that doesn’t show any of my actual individual materials. This is a game development forum, so I’m sure people can understand not wanting to share actual source.

It will probably show something like calculating out three or four masks, instead of the one in the example, packing them into a V3/V4 and then feeding those masks to the various branches and channels of the material. The masker would have A1/B1, A2/B2, etc inputs to solve C1/C2/C3/etc outputs and then you’d just say return = float4(C1,C2,C3,C4). It doesn’t have to be too fancy of a custom node because you’ll just solve everything out with regular nodes leading into the custom node. Here’s a super quick example of what I’m talking about:

EDIT: Oh and I know that you can easily calc this stuff out with regular IF nodes, but like I said in the last video, the lerp nodes are really flakey about their alpha inputs and it will directly determine whether or not the lerp will/will not branch. I’m assuming because the the custom node acts as a “blocker” or something and it can’t precompute the stuff behind it or something along those lines.

Deathrey · August 6, 2017, 4:59am

Lerps do not branch. They are only subjected to constant folding of alpha input, which you successfully demonstrated in your examples.

You are making too many assumptions, for a sworn follower of scientific method. That leads you to misconceptions and wrong conclusions.

Nope. The thread is about dynamic flow control and its importance for UE4. You are showing constant folding only up to this point, which was working flawlessly even without your intervention. Quit derailing yet another thread.

Chosker · August 6, 2017, 9:14am

another broad assumption. you could have but you didn’t, thus failing to prove anything with “working model 2”

I don’t know where criticising my code comes from at this point, given that it was just a test to compare against your method. and a test that stresses the most common case where branching is desireable (sampling textures)
it’s not about my code being fancy or revolutionary. it’s about what the code is doing: having the operations nested inside the branch or having it outside of the branch

you’re missing the point once again and derailing the thread. so let me remind you:

You claim your method can do dynamic shader branching using regular material nodes, with just a Custom Node containing the branch evaluation
We claim the only method to do dynamic shader branching is by nesting all the to-be-branched operations inside the branch itself, which forces writing the operations as HLSL code inside the custom node, which means you can’t use regular material nodes

by using my Custom node and having the texture sampler loop nested inside the branch, and of course not even using regular nodes, you failed to prove your method because you used our method.

so far you haven’t shown a single case where your method is used properly and comparing the performance against not using the branching at all, which is paramount when measuring performance.

if you still intend to spend a bit of time with tests and showing “that it works” I’ll suggest to try your method another common case of dynamic branching: triplanar mapping. as you know it will sample a texture 3 times (once per axis) so the cost of those 3 samplers is there for all pixels permanently. with dynamic branching you’d have only the cost of a single sampler (on areas where any given axis is 100% blended, which should be the majority), and just a few more costly areas where 2 or 3 textures will be sampled (the parts where the axis blend)
of course if you do, provide a comparison against using triplanar mapping without any sort of branching, otherwise there’s no indicator of wether or not the branching is beneficial.
I’d do it myself but your method looks different every time you show it, so I’m sure I won’t get it the way you claim it to work

IronicParadox · August 6, 2017, 1:24pm

Yeah, I found a round about way to make them work with branching… It’s sort of sketchy, but it definitely works because I have many cases of it working now. The custom nodes seem to act as “blockers” that keep it all from getting folded statically. It was clearly demonstrated in one of the videos and you’d know if you were paying attention lol. It’s also very evident the video at the bottom.

Lots of experimenting, failing, retrying, succeeding, etc has gone into all of this, rather than just sitting there idly and proclaiming it doesn’t work; like you’ve been doing. You have to learn to not be scared with failure, otherwise it will prevent you from trying to succeed.

And yeah, I’m showing that it can be done with MINIMAL hlsl. It’s not super hard to write out five or ten lines of code and it’s definitely worth it if it’s going to save you a ton of frame time. Would it be nice to be able to do it with 100% pure UE nodes? Absolutely… But there are workarounds for now, so quit complaining and use them.

Here is a clip of an upcoming series that I’m going to work on, when I’ve got the time, talking about this whole topic. It has two completely identical materials with a few changes. One has regular IFs for masking and the other has custom IFs. I’m using the radial blur MF for the “complex” effect, but I had to refactor it for the branched version because it doesn’t handle nested loops with variable loop counts. It will spit all kinds of errors about uncoiling and such. I had to change it from a texture2dsample to a tex.samplegrad and feed it derivatives. Not really a big deal, but it allows it to branch now. If anything, that would probably make it slightly less performant, but it’s negligible compared to the frame-rate benefit. Anyways, it has a few effects right now, but I’m adding in more and will eventually work it up to even handle blending landscape materials. Right now, it’s mostly just a a range cutoff, dot product, expensive blur on the dot product and a regular version of the material. It’s a technical material.

Chosker · January 19, 2018, 7:30am

bump.

what ever happened to this? UE-33876 is gone…

Speuzer · February 13, 2018, 5:06pm

Bump.
What happened to Unreal Engine Issues and Bug Tracker (UE-33876) ?

Deathrey · February 14, 2018, 6:27pm

Got removed from issue tracker for being feature request, not a bug.

Luos · March 3, 2019, 6:28pm

Bump for justice

PhosphorArt · September 9, 2019, 4:03pm

If there was a way to just a simple way to have a threshold-ed lerp branch Material layers (a dither would be nice, but on/off would at last work) , it would be amazing and useful for so many cases

Axton9 · December 11, 2019, 11:21pm

This is still sorely needed, and doesn’t ever seem to have been given any attention. It’s nearly impossible to recreate most modern AAA landscape rendering techniques without this functionality.

Dash-POWER · February 15, 2020, 9:05pm

I guess this is still not solved even in UE 4.24.2. Dynamic material branching is not working either for (if) node, (dynamic branch) node and custom node (if statement) for me.

I want to create landscape with POM and want to cull POM only after a few meters because I don’t need to have POM for entire landscape component and want to save some performance.

MostHost_LA · February 15, 2020, 10:05pm

Hey all, could we get a 2020 rant-less version of this?
@ @Chosker

It’s honestly hard to follow along without cringing every so often.

Plus, I’m sure that both of you have evolved your skills and came up with new solutions since it all started in 2016?

Calvinatorr · May 8, 2020, 7:18pm

MostHost LA came across this again and thought it’s only responsible to post what I believe is the solution here. It can be done as when the code generated via the material editor gets compiled to asm, it will the code for each result into its own branch.

So to achieve this you just use a custom node with the BRANCH HLSL macro as so.

Inspecting the pixel in the API call in RenderDoc shows this asm, which to me is a pretty clear indicator that this actually works (surprisingly).
I believe this is also the same on PS4 when the code gets ported to PSGL. Not sure about other platforms.

Please let me know if you think is somehow wrong, but to me it seems as clear as day that this should work.
Would love to see this integrated with BlendMaterialAttributes to branch whole layers.

Deathrey · May 8, 2020, 10:47pm

It is only applicable past 4.20 or so. In earlier versions generated code was different.

Yet, not all material networks will work out of the box, and it will not be cross-compiled that way for all platforms.
You still need to place network in question within the braces.

MostHost_LA · May 8, 2020, 11:58pm

I will make a simple test of the custom branch on a landscape and test performance differential.

however, if you are testing between a solid color and a texture you are obviously going to have a performance improvement IF that texture isn’t loaded into memory at all (Right? Seems logical at least).

@ Could you define/explain what you mean with an example?