Dynamic flow control in materials

ISSUE TRACKER LINK
Make sure to vote for this feature.

[HR][/HR]

In my efforts to bring down landscape render times, dynamic shader branching has proven to be insanely useful.
However, I can’t get it to work as expected using material node networks only.

As an example, this network:

http://image.prntscr.com/image/5f680c891f2b43969be0a809722a7f8a.png

looks like that in HLSL:


 // Now the rest of the inputs
    MaterialFloat Local0 = min(max(Parameters.TangentToWorld[2].b,0.00000000),1.00000000);
    MaterialFloat3 Local1 = (GetWorldPosition(Parameters) / 512.00000000);
    MaterialFloat2 Local2 = DDY(Local1.rg);
    MaterialFloat2 Local3 = DDX(Local1.rg);
    MaterialFloat4 Local4 = ProcessMaterialColorTextureLookup(Texture2DSampleGrad(Material.Texture2D_0,Material.Texture2D_0Sampler,Local1.rg,Local3,Local2));
    MaterialFloat3 Local5 = (GetWorldPosition(Parameters) / 2048.00000000);
    MaterialFloat2 Local6 = DDY(Local5.rg);
    MaterialFloat2 Local7 = DDX(Local5.rg);
    MaterialFloat4 Local8 = ProcessMaterialColorTextureLookup(Texture2DSampleGrad(Material.Texture2D_0,Material.Texture2D_0Sampler,Local5.rg,Local7,Local6));
    MaterialFloat3 Local9 = ((Local0 >= 0.50000000) ? Local4.rgb : Local8.rgb);

Which is branch flattening pretty much. Both textures will be sampled, and one of values will not be used.

Now the same node network, but using a custom node:

http://image.prntscr.com/image/9c06de2c688e4b7ea4cee0279f5fac5c.png

Custom node code:


float4 result=0;
if(VertexNormalB>=0.5)

{
result=FirstTex.SampleGrad(FirstTexSampler,FirstUVs,FirstDDX,FirstDDY);
}
else
{
result=SecondTex.SampleGrad(SecondTexSampler,SecondUVs,SecondDDX,SecondDDY);
}
return result;

Compiled code looks as follows, in this case:


// Now the rest of the inputs
    MaterialFloat3 Local0 = (GetWorldPosition(Parameters) / 512.00000000);
    MaterialFloat3 Local1 = (GetWorldPosition(Parameters) / 2048.00000000);
    MaterialFloat2 Local2 = DDX(Local0.rg);
    MaterialFloat2 Local3 = DDY(Local0.rg);
    MaterialFloat2 Local4 = DDX(Local1.rg);
    MaterialFloat2 Local5 = DDY(Local1.rg);
    MaterialFloat Local6 = min(max(Parameters.TangentToWorld[2].b,0.00000000),1.00000000);
    MaterialFloat4 Local7 = **CustomExpression0**(Parameters,Local0.rg,Local1.rg,Local2,Local3,Local4,Local5,Material.Texture2D_0,Material.Texture2D_0Sampler,Material.Texture2D_0,Material.Texture2D_0Sampler,Local6);

CustomExpression0 is pretty much our custom node code, and looks like that:


// Uniform material expressions.
MaterialFloat4 CustomExpression0(FMaterialPixelParameters Parameters,MaterialFloat2 FirstUVs,MaterialFloat2 SecondUVs,MaterialFloat2 FirstDDX,MaterialFloat2 FirstDDY,MaterialFloat2 SecondDDX,MaterialFloat2 SecondDDY,Texture2D FirstTex, SamplerState FirstTexSampler ,Texture2D SecondTex, SamplerState SecondTexSampler ,MaterialFloat VertexNormalB)
{
float4 result=0;
if(VertexNormalB>=0.5)

{
result=FirstTex.SampleGrad(FirstTexSampler,FirstUVs,FirstDDX,FirstDDY);
}
else
{
result=SecondTex.SampleGrad(SecondTexSampler,SecondUVs,SecondDDX,SecondDDY);
}
return result;
}

In latter case, there is a proper dynamic branching, resulting in one texture sampling being skipped for corresponding pixels. It may be insignificant for one texture lookup, but if this is used to skip a block of code, for example distance blended tri-plannar cliff layer in case for landscape, it proves to be major performance win.

My question: Is it possible to get dynamic branching using material nodes only? If yes, then how.
My thanks for any assistance rendered.

I’m actually interested in knowing this as well, for similar reasons. The performance gains are potentially pretty large with certain material setups.

Hmm looks like the compiler may be trying to be clever here. Not sure what you can do about it but some thoughts:

Maybe mess with the equals threshold. It is ineteresting that nothing from that setting made its way into the hlsl but maybe that only happens with default settings?

Also you could try hooking up your two textures as A and B inputs in a custom node, and have just the IF statement in the custom node rather than having to move everything over. It should be the same thing either way I believe (except the instructions won’t appear within the custom expression).

@ Thanks for answering.

Result is pretty much the same. Whatever is done in custom node, gets branched correctly.

If i plug texture samples results into a custom node with if statement, then both textures are sampled and result is chosen in custom node.

equals threshold changes nothing for me. It is still choosing between two sampled values, rather than producing a branch.

I am entering a ticket request to see if we can add a boolean option to the If node to force the [branch] attribute and avoid using the ternary interpretation.

@
That would be truly amazing.

However, how the compiler will decide what goes under IF, and what stays before it?

In context of this network:

http://image.prntscr.com/image/05b4fcea2a2d43e891e9a5d435758221.png

Texture.SampleGrad should go into respective IF and ELSE sections, and the green comment blocks should be done before IF statement

There probably needs to be some sort of marker nodes to specify where branching ends. And what if you want to calculate several values under same IF/ELSE, and not just one Vector4 ?

Additionally, there is an issue with texture sample nodes, that are set to use explicit derivatives or level. They will not share samplers between different textures.

While this improvement to the compiler can potentially be exceptionally good, I imagine it being quite a bunch of complicated work.

Alternatively, would it be possible to make a change to the custom node, so in addition to outputting a float4, you could output float4x4, which would be broken down to 4 Vector4 material pins?
This or similar change to custom node would probably cover 99% of needs of anyone who wants to use more advanced things in pixel shader, dynamic branching included. And something tells me that it should not be very hard to implement.

@Maximum-Dev mentioning you here, because I noticed you were interested in landscape material optimizations. This is one of the features that would bring major render time reduction for terrain.

Lastly, I am quite surprised that dynamic flow control in pixel shaders is not receiving a lot of community attention. To my best knowledge, it has been used quite extensively in recent and no so recent released titles. Area of its application is quite limited. It works best that something, that occupies large screen spaces. Landscapes, Water surfaces, Large cliffs. However potential performance gain is unmatched.

Hi,

Thanks for letting me know about it man!
I’m not quite sure what am I looking at though, what you are doing is basically giving the same results as this but cheaper in render time? Please correct me if I’m wrong. :slight_smile:

Thanks.

@Maximum-Dev Generally, it works by skipping instructions, that are not required. For example, in a landscape you want to have 8 layers with triplannar projection. If you try to have all 8 at the same component at the same time, the performance will be quite low. With dynamic branching you can skip sampling unnecessary textures, that are fully covered by another textures. Alternative example would be having 16 layers with POM on a single component, which is not realistic without multi-pass or dynamic branching.

I see now. I have been primarily working with landscapes in UE4 since 2015 and always struggling with performance. Even with simple shaders and few textures the cost per component is usually higher than expected. Skipping the instructions where it’s not needed should be a #1 issue that needs to be sorted out.

Yeah this is a major problem that should be resolved.

This PDF mentions dynamic branching being used to skip unnecessary projections in Tri-Plannar mapping on page 33

I’ve ran a short test in UE4 to see how large is the actual performance gain.

  • A test scene included a piece of uneven terrain, that takes all the screen space, one directional light and a skylight, no static lighting.

  • Tests were carried out on GTX 770, at 1920x1080. Data below is taken from GPU profiler BasePass time for 10 captures and averaged.

  • Terrain material consisted of 6 tri-way layers, basecolor only, 2k textures. Layers were roughly painted in to be somewhat equally distributed.

[HR][/HR]

  • First version was created using typical material editor nodes:

[INDENT]
Material Editor Nodes(No dynamic branching) - 2.69ms[/INDENT]

  • Second version of landscape material was done with custom node, where all 6 layers were present together, but X,Y and Z triplannar projections are being skipped, depending on surface normal, as described in above-mentioned PDF:

[INDENT]
*
Tighten Factor 0 * - 3.28ms
Tighten Factor 0.3 - 1.64ms
Tighten Factor 0.5 -1.52ms
*Tighten Factor 0.9 * -1.48ms[/INDENT]

  • Third landscape material was same as second, but with addition of branching out landscape layers, fully covered by other layers:

[INDENT]Tighten Factor 0 - 1.77ms
Tighten Factor 0.3 - 1.50ms
Tighten Factor 0.5 -1.40ms
Tighten Factor 0.9 -1.26ms [/INDENT]

  • Fourth version of material was same as third, but with [flatten] atribute before IF statement:

[INDENT]Custom Node([flatten] attribute) - 3.28ms[/INDENT]

  • And lastly, a fifth version was a custom node with no if statements at all:

[INDENT]Custom Node(No dynamic branching) - 3.11ms[/INDENT]
[HR][/HR]
Graph:
[SPOILER]

[/SPOILER]
[HR][/HR]
Interpreting the results, the first thing to note is the difference between Test 1 and 5
I think that most of the difference comes from using SampleGrad. In addition, I think material translator performs optimizations, like marking texture size and tighten factor as unifrom, etc.

Then there is a jump in render time for Test 2 between Tighten Factor set at 0 and 0.3. Performance gain is mostly influenced by area of flat terrain, which occupies roughly 40% of screen space. With Tighten factor of 0.3 XZ and YZ projections are skipped.

As we add layer branching in test 3, render times are further reduced.

Visually, Tighten Factor values above 0.5 deliver a lot of stretching, so it is unlikely that anything above that will be used.

By introducing dynamic branching in this case, I’ve got a considerable performance increase. Of course this test is not anywhere conclusive and it cannot be applied for every case and the results will vary greatly material to material, but it clearly shows the benefits of skipping texture samplers with dynamic flow control.

I’d kindly ask anyone interested in this feature to upvote these answer hub posts: 1]([FEATURE REQUEST]Dynamic flow control in materials - Rendering - Unreal Engine Forums) and 2

I hope this gets resolved soon, Keep up the good work.

I hope this get some development efforts. But do anyone have idea for syntax that would be usable with nodes?


[branch] if()
{

}

Inside a custom node, this works as expected. Anything under IF cannot contain gradient functions, so SampleLevel or SampleGrad must be used, and derivatives must be calculated before the branch in case with SampleGrad.

Also, if you modify output of custom node by anything, that depends on sampling another texture, branching fails. I do not see apparent cause for this.

There is still no ticket for the whole issue btw.

Doh, I had the ticket mail sitting in my drafts so apparently I forgot to hit send before going on vacation, my bad :slight_smile: Should be UE-33876

The ticket is now on the tracker, but went into backlog, sadly. Make sure to cast some votes. :rolleyes:
[HR][/HR]
-> ISSUE TRACKER LINK <-](Unreal Engine Issues and Bug Tracker (UE-33876))

I cast my vote! This is pretty cool stuff. I will have to give this a go in the landscape material I’m working on.
There is this and the landscape/static mesh blending (but that of course is another topic :wink: ) which are big issues for me.

any news on this ?

Nothing so far.

Hi everyone,

For the last days, I worked on a proof of concept of a branching node. And what I can say is that it’s far from being easy as it’s completely against the flow of the material compiler :stuck_out_tongue:

I managed to do something like this:

This sounds good but there are 2 drawbacks:

  • For such a simple example, this is taking longer to render. I have capture from Pix from Xbox but as I have absolutely no idea if I can post it other than UDN and Xbox Dev forum, I won’t take the risk. But roughly, the pixel pass, while having less instructions is having a general higher cost per instructions which makes it longer to process when branching. The good thing is that on a bigger scale of things branched out (like a triplanar sampling for example) it would probably be beneficial.
  • It fails to compile because DDX/DDY cannot be into the branch. It also fails when the sampling into the branches are not using SampleGrad (so a simple sampling fails). It’s an API limitation apparently and triggers X4014 error (cannot have divergent gradient operations inside flow control). So it means that the idea to replace the “if node”, that uses abs() and some math to make the condition, with a real “if” and the “branch” tag sound quite optimistic. (I initially started to tackle this topic in an optimistic mood in fact :P)

For this second point, I added a dynamic amount of input called “StopBranch” that will exclude code from the branch:


And it works fine now.

This is a proof of concept, this probably fails hard when you link element between branches (but why would you do that if you decide to force a branch).

What I plan to do is still to experiment it on the landscape. Considering there that I want to challenge this with texture arrays to see what’s best for our project (because you guessed it, it’s very context-dependent and only profiling could give a hint on if it’s worth). It can be both in fact (Branching and Texture arrays).

The hard question: can I make the code public? Unfortunately, I don’t think so. This code is tied to my current project (professional) and is currently very hacky and untested (so not even a possibility of a pull request). Most likely, to make this properly (aka: public release), that would need a big commitment and that’s the point of my post. The material compiler is not tailored to produce code like that and that would need quite a refactor of it.
However, as you’ve seen, we need it and so it naturally adds me into the group of people interested by such a feature :slight_smile: