Announcement

Collapse
No announcement yet.

Dynamic flow control in materials

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

    Dynamic flow control in materials

    ISSUE TRACKER LINK

    Make sure to vote for this feature.




    In my efforts to bring down landscape render times, dynamic shader branching has proven to be insanely useful.
    However, I can't get it to work as expected using material node networks only.

    As an example, this network:



    looks like that in HLSL:

    Code:
     // Now the rest of the inputs
        MaterialFloat Local0 = min(max(Parameters.TangentToWorld[2].b,0.00000000),1.00000000);
        MaterialFloat3 Local1 = (GetWorldPosition(Parameters) / 512.00000000);
        MaterialFloat2 Local2 = DDY(Local1.rg);
        MaterialFloat2 Local3 = DDX(Local1.rg);
        MaterialFloat4 Local4 = ProcessMaterialColorTextureLookup(Texture2DSampleGrad(Material.Texture2D_0,Material.Texture2D_0Sampler,Local1.rg,Local3,Local2));
        MaterialFloat3 Local5 = (GetWorldPosition(Parameters) / 2048.00000000);
        MaterialFloat2 Local6 = DDY(Local5.rg);
        MaterialFloat2 Local7 = DDX(Local5.rg);
        MaterialFloat4 Local8 = ProcessMaterialColorTextureLookup(Texture2DSampleGrad(Material.Texture2D_0,Material.Texture2D_0Sampler,Local5.rg,Local7,Local6));
        MaterialFloat3 Local9 = ((Local0 >= 0.50000000) ? Local4.rgb : Local8.rgb);
    Which is branch flattening pretty much. Both textures will be sampled, and one of values will not be used.

    Now the same node network, but using a custom node:


    Custom node code:
    Code:
    float4 result=0;
    if(VertexNormalB>=0.5)
    
    {
    result=FirstTex.SampleGrad(FirstTexSampler,FirstUVs,FirstDDX,FirstDDY);
    }
    else
    {
    result=SecondTex.SampleGrad(SecondTexSampler,SecondUVs,SecondDDX,SecondDDY);
    }
    return result;
    Compiled code looks as follows, in this case:

    Code:
    // Now the rest of the inputs
        MaterialFloat3 Local0 = (GetWorldPosition(Parameters) / 512.00000000);
        MaterialFloat3 Local1 = (GetWorldPosition(Parameters) / 2048.00000000);
        MaterialFloat2 Local2 = DDX(Local0.rg);
        MaterialFloat2 Local3 = DDY(Local0.rg);
        MaterialFloat2 Local4 = DDX(Local1.rg);
        MaterialFloat2 Local5 = DDY(Local1.rg);
        MaterialFloat Local6 = min(max(Parameters.TangentToWorld[2].b,0.00000000),1.00000000);
        MaterialFloat4 Local7 = CustomExpression0(Parameters,Local0.rg,Local1.rg,Local2,Local3,Local4,Local5,Material.Texture2D_0,Material.Texture2D_0Sampler,Material.Texture2D_0,Material.Texture2D_0Sampler,Local6);
    CustomExpression0 is pretty much our custom node code, and looks like that:
    Code:
    // Uniform material expressions.
    MaterialFloat4 CustomExpression0(FMaterialPixelParameters Parameters,MaterialFloat2 FirstUVs,MaterialFloat2 SecondUVs,MaterialFloat2 FirstDDX,MaterialFloat2 FirstDDY,MaterialFloat2 SecondDDX,MaterialFloat2 SecondDDY,Texture2D FirstTex, SamplerState FirstTexSampler ,Texture2D SecondTex, SamplerState SecondTexSampler ,MaterialFloat VertexNormalB)
    {
    float4 result=0;
    if(VertexNormalB>=0.5)
    
    {
    result=FirstTex.SampleGrad(FirstTexSampler,FirstUVs,FirstDDX,FirstDDY);
    }
    else
    {
    result=SecondTex.SampleGrad(SecondTexSampler,SecondUVs,SecondDDX,SecondDDY);
    }
    return result;
    }
    In latter case, there is a proper dynamic branching, resulting in one texture sampling being skipped for corresponding pixels. It may be insignificant for one texture lookup, but if this is used to skip a block of code, for example distance blended tri-plannar cliff layer in case for landscape, it proves to be major performance win.

    My question: Is it possible to get dynamic branching using material nodes only? If yes, then how.
    My thanks for any assistance rendered.
    Last edited by Deathrey; 08-06-2016, 05:26 AM.

    #2
    I'm actually interested in knowing this as well, for similar reasons. The performance gains are potentially pretty large with certain material setups.

    Comment


      #3
      Hmm looks like the compiler may be trying to be clever here. Not sure what you can do about it but some thoughts:

      Maybe mess with the equals threshold. It is ineteresting that nothing from that setting made its way into the hlsl but maybe that only happens with default settings?

      Also you could try hooking up your two textures as A and B inputs in a custom node, and have just the IF statement in the custom node rather than having to move everything over. It should be the same thing either way I believe (except the instructions won't appear within the custom expression).
      Ryan Brucks
      Principal Technical Artist, Epic Games

      Comment


        #4
        [MENTION=3692]RyanB[/MENTION] Thanks for answering.

        Also you could try hooking up your two textures as A and B inputs in a custom node, and have just the IF statement in the custom node rather than having to move everything over
        Result is pretty much the same. Whatever is done in custom node, gets branched correctly.

        If i plug texture samples results into a custom node with if statement, then both textures are sampled and result is chosen in custom node.

        Maybe mess with the equals threshold. It is ineteresting that nothing from that setting made its way into the hlsl but maybe that only happens with default settings?
        equals threshold changes nothing for me. It is still choosing between two sampled values, rather than producing a branch.

        Comment


          #5
          I am entering a ticket request to see if we can add a boolean option to the If node to force the [branch] attribute and avoid using the ternary interpretation.
          Ryan Brucks
          Principal Technical Artist, Epic Games

          Comment


            #6
            [MENTION=3692]RyanB[/MENTION]
            That would be truly amazing.

            However, how the compiler will decide what goes under IF, and what stays before it?

            In context of this network:



            Texture.SampleGrad should go into respective IF and ELSE sections, and the green comment blocks should be done before IF statement


            There probably needs to be some sort of marker nodes to specify where branching ends. And what if you want to calculate several values under same IF/ELSE, and not just one Vector4 ?

            Additionally, there is an issue with texture sample nodes, that are set to use explicit derivatives or level. They will not share samplers between different textures.

            While this improvement to the compiler can potentially be exceptionally good, I imagine it being quite a bunch of complicated work.

            Alternatively, would it be possible to make a change to the custom node, so in addition to outputting a float4, you could output float4x4, which would be broken down to 4 Vector4 material pins?
            This or similar change to custom node would probably cover 99% of needs of anyone who wants to use more advanced things in pixel shader, dynamic branching included. And something tells me that it should not be very hard to implement.

            [MENTION=13257]Maximum-Dev[/MENTION] mentioning you here, because I noticed you were interested in landscape material optimizations. This is one of the features that would bring major render time reduction for terrain.

            Lastly, I am quite surprised that dynamic flow control in pixel shaders is not receiving a lot of community attention. To my best knowledge, it has been used quite extensively in recent and no so recent released titles. Area of its application is quite limited. It works best that something, that occupies large screen spaces. Landscapes, Water surfaces, Large cliffs. However potential performance gain is unmatched.
            Last edited by Deathrey; 07-09-2016, 05:49 AM.

            Comment


              #7
              Hi,

              Thanks for letting me know about it man!
              I'm not quite sure what am I looking at though, what you are doing is basically giving the same results as this but cheaper in render time? Please correct me if I'm wrong.




              Thanks.
              Artstation
              Join the support channel
              Gumroad Store

              Comment


                #8
                [MENTION=13257]Maximum-Dev[/MENTION] Generally, it works by skipping instructions, that are not required. For example, in a landscape you want to have 8 layers with triplannar projection. If you try to have all 8 at the same component at the same time, the performance will be quite low. With dynamic branching you can skip sampling unnecessary textures, that are fully covered by another textures. Alternative example would be having 16 layers with POM on a single component, which is not realistic without multi-pass or dynamic branching.

                Comment


                  #9
                  Originally posted by Deathrey View Post
                  [MENTION=13257]Maximum-Dev[/MENTION] Generally, it works by skipping instructions, that are not required. For example, in a landscape you want to have 8 layers with triplannar projection. If you try to have all 8 at the same component at the same time, the performance will be quite low. With dynamic branching you can skip sampling unnecessary textures, that are fully covered by another textures. Alternative example would be having 16 layers with POM on a single component, which is not realistic without multi-pass or dynamic branching.
                  I see now. I have been primarily working with landscapes in UE4 since 2015 and always struggling with performance. Even with simple shaders and few textures the cost per component is usually higher than expected. Skipping the instructions where it's not needed should be a #1 issue that needs to be sorted out.
                  Artstation
                  Join the support channel
                  Gumroad Store

                  Comment


                    #10
                    Yeah this is a major problem that should be resolved.

                    Comment


                      #11
                      This PDF mentions dynamic branching being used to skip unnecessary projections in Tri-Plannar mapping on page 33

                      I've ran a short test in UE4 to see how large is the actual performance gain.
                      • A test scene included a piece of uneven terrain, that takes all the screen space, one directional light and a skylight, no static lighting.

                      • Tests were carried out on GTX 770, at 1920x1080. Data below is taken from GPU profiler BasePass time for 10 captures and averaged.

                      • Terrain material consisted of 6 tri-way layers, basecolor only, 2k textures. Layers were roughly painted in to be somewhat equally distributed.


                      • First version was created using typical material editor nodes:


                      Material Editor Nodes(No dynamic branching) - 2.69ms
                      • Second version of landscape material was done with custom node, where all 6 layers were present together, but X,Y and Z triplannar projections are being skipped, depending on surface normal, as described in above-mentioned PDF:


                      Tighten Factor 0
                      - 3.28ms
                      Tighten Factor 0.3 - 1.64ms
                      Tighten Factor 0.5 -1.52ms
                      Tighten Factor 0.9 -1.48ms
                      • Third landscape material was same as second, but with addition of branching out landscape layers, fully covered by other layers:

                      Tighten Factor 0 - 1.77ms
                      Tighten Factor 0.3 - 1.50ms
                      Tighten Factor 0.5 -1.40ms
                      Tighten Factor 0.9 -1.26ms

                      • Fourth version of material was same as third, but with [flatten] atribute before IF statement:


                      Custom Node([flatten] attribute) - 3.28ms
                      • And lastly, a fifth version was a custom node with no if statements at all:


                      Custom Node(No dynamic branching) - 3.11ms


                      Graph:
                       
                      Spoiler



                      Interpreting the results, the first thing to note is the difference between Test 1 and 5
                      I think that most of the difference comes from using SampleGrad. In addition, I think material translator performs optimizations, like marking texture size and tighten factor as unifrom, etc.

                      Then there is a jump in render time for Test 2 between Tighten Factor set at 0 and 0.3. Performance gain is mostly influenced by area of flat terrain, which occupies roughly 40% of screen space. With Tighten factor of 0.3 XZ and YZ projections are skipped.

                      As we add layer branching in test 3, render times are further reduced.

                      Visually, Tighten Factor values above 0.5 deliver a lot of stretching, so it is unlikely that anything above that will be used.

                      By introducing dynamic branching in this case, I've got a considerable performance increase. Of course this test is not anywhere conclusive and it cannot be applied for every case and the results will vary greatly material to material, but it clearly shows the benefits of skipping texture samplers with dynamic flow control.

                      I'd kindly ask anyone interested in this feature to upvote these answer hub posts: 1 and 2
                      Last edited by Deathrey; 07-18-2016, 02:01 PM.

                      Comment


                        #12
                        I hope this gets resolved soon, Keep up the good work.

                        Comment


                          #13
                          I hope this get some development efforts. But do anyone have idea for syntax that would be usable with nodes?

                          Comment


                            #14
                            Originally posted by Jenny Gore View Post
                            I hope this get some development efforts. But do anyone have idea for syntax that would be usable with nodes?
                            Code:
                            [branch] if()
                            {
                            
                            }
                            Inside a custom node, this works as expected. Anything under IF cannot contain gradient functions, so SampleLevel or SampleGrad must be used, and derivatives must be calculated before the branch in case with SampleGrad.

                            Also, if you modify output of custom node by anything, that depends on sampling another texture, branching fails. I do not see apparent cause for this.

                            There is still no ticket for the whole issue btw.

                            Comment


                              #15
                              Doh, I had the ticket mail sitting in my drafts so apparently I forgot to hit send before going on vacation, my bad Should be UE-33876
                              Ryan Brucks
                              Principal Technical Artist, Epic Games

                              Comment

                              Working...
                              X