fine I wasn’t aware of the custom node execution order logic
in this new case I’ve made 256 texture samplers, offsetting the UVs and averaging them just like my previous test. except I use 128 samplers of texture A and 128 samplers of texture B (both 2k textures from shootergame), and the UV offsetting value is even bigger (so the color is just a blur). all of this using purely nodes.
also this time I remembered to close every other editor window as they take up significant performance (which explains the difference even between the simple material across the 2 tests)
our good friend the simple material
256 sampled textures
and this is a portion of how the material looks
let’s go to your branch, just the way you’ve explained the usage
I hook the 256 samples into your branch
biasing the gradient so no pixels visually appear with the 256 samples. performance is bad. it’s not only not comparable to the simple material, it’s even worse to just sampling the 256 textures without any of your branching
biasing the gradient to show half the sphere with the 256 samples. things get worse
biasing the gradient to show the full sphere with the 256 samples. things get even worse.
this is the HLSL code:
the function (which matches yours perfectly)
// Uniform material expressions.
MaterialFloat3 CustomExpression0(FMaterialPixelParameters Parameters,MaterialFloat A,MaterialFloat B,MaterialFloat ThroughA,MaterialFloat3 ThroughB)
{
[branch] if ( A >= B)
{
return ThroughA;
}
else
{
return ThroughB;
}
}
and the actual material using the function:
[a crapton more sampler operations go above here]
MaterialFloat4 Local1267 = ProcessMaterialColorTextureLookup(Texture2DSample(Material.Texture2D_1,Material.Texture2D_1Sampler,Local1266));
MaterialFloat3 Local1268 = (Local1267.rgb / 126.50000000);
MaterialFloat2 Local1269 = (Parameters.TexCoords[0].xy * 1.00000000);
MaterialFloat2 Local1270 = (Local1269 + (MaterialFloat2(0.23450001,0.23450001) * 63.00000000));
MaterialFloat4 Local1271 = ProcessMaterialColorTextureLookup(Texture2DSample(Material.Texture2D_1,Material.Texture2D_1Sampler,Local1270));
MaterialFloat3 Local1272 = (Local1271.rgb / 126.50000000);
MaterialFloat3 Local1273 = (Local1268 + Local1272);
MaterialFloat3 Local1274 = (Local1264 + Local1273);
MaterialFloat3 Local1275 = (Local1255 + Local1274);
MaterialFloat3 Local1276 = (Local1236 + Local1275);
MaterialFloat3 Local1277 = (Local1197 + Local1276);
MaterialFloat3 Local1278 = (Local1118 + Local1277);
MaterialFloat3 Local1279 = (Local959 + Local1278);
MaterialFloat3 Local1280 = (Local640 + Local1279);
** MaterialFloat3 Local1281 = CustomExpression0(Parameters,Local1.g,0.50000000,0.00000000,Local1280);**
PixelMaterialInputs.EmissiveColor = Material.VectorExpressions[2].rgb;
PixelMaterialInputs.Opacity = 1.00000000;
PixelMaterialInputs.OpacityMask = 1.00000000;
PixelMaterialInputs.BaseColor = Local1281;
PixelMaterialInputs.Metallic = 0.00000000;
PixelMaterialInputs.Specular = 0.50000000;
PixelMaterialInputs.Roughness = 0.50000000;
PixelMaterialInputs.AmbientOcclusion = 1.00000000;
PixelMaterialInputs.Refraction = 0;
PixelMaterialInputs.PixelDepthOffset = 0.00000000;
as you can see, all the complex operations are executed before the branch and then the branch evaluates and returns (exactly the way you use and show it yourself)
back into my version of the branch, I remade the same behavior all inside the custom node nested into the branch
performance isn’t as good as the branch-less version but as explained many times, the branch has a cost
and everything branched out:
first, it’s indeed relevant to avoid the extra custom node going into your version of the branch. with your branching at least now the performance is correlative to the amount of pixels shown, so it shows that indeed some sort of branching is going on
however your method is showing to be counter-productive beyond measure. “branching out” all the expensive parts is even more expensive than just processing the expensive parts in a branch-less version, while “branching in” the expensive parts again makes things even worse.
now,
for the sake of clearing things up a bit more I’ve re-created your version of the branch in my custom node
I do the big loop outside of the branch and just return the result based on the branch evaluation.
this in theory should be the equivalent of what you’re doing, with the difference that my loop is actually inside the CustomExpression function while your examples have everything outside the function except the branch itself (not sure how relevant it is, but that’s that)
all pixels “branched out”
all pixels “branched in”
and the generated HLSL code.
the function:
MaterialFloat3 CustomExpression0(FMaterialPixelParameters Parameters,Texture2D TexObj, SamplerState TexObjSampler ,MaterialFloat2 TexUVs,MaterialFloat2 TexDDX,MaterialFloat2 TexDDY,MaterialFloat A,MaterialFloat B,Texture2D TexObj2, SamplerState TexObj2Sampler )
{
int i;
int maxIt = 128;
float4 result = float4(0,0,0,0);
for (i = 0; i < maxIt; ++i)
{
result += TexObj.SampleGrad(TexObjSampler,TexUVs + float2(0.2345,0.2345) * i,TexDDX,TexDDY) / maxIt;
result += TexObj2.SampleGrad(TexObj2Sampler,TexUVs + float2(0.2345,0.2345) * i,TexDDX,TexDDY) / maxIt;
}
[branch] if ( A >= B)
{
return float4(0,0,0,0);
}
else
{
return result;
}
}
and the actual material using the function:
// Now the rest of the inputs
MaterialFloat2 Local0 = (Parameters.TexCoords[0].xy * 1.00000000);
MaterialFloat2 Local1 = DDX(Local0);
MaterialFloat2 Local2 = DDY(Local0);
MaterialFloat2 Local3 = (Parameters.TexCoords[0].xy * 1.00000000);
MaterialFloat2 Local4 = (Local3 + Material.ScalarExpressions[0].x);
MaterialFloat3 Local5 = CustomExpression0(Parameters,Material.Texture2D_0,Material.Texture2D_0Sampler,Local0,Local1,Local2,Local4.g,0.50000000,Material.Texture2D_1,Material.Texture2D_1Sampler);
PixelMaterialInputs.EmissiveColor = Material.VectorExpressions[2].rgb;
PixelMaterialInputs.Opacity = 1.00000000;
PixelMaterialInputs.OpacityMask = 1.00000000;
PixelMaterialInputs.BaseColor = Local5;
PixelMaterialInputs.Metallic = 0.00000000;
PixelMaterialInputs.Specular = 0.50000000;
PixelMaterialInputs.Roughness = 0.50000000;
PixelMaterialInputs.AmbientOcclusion = 1.00000000;
PixelMaterialInputs.Refraction = 0;
PixelMaterialInputs.PixelDepthOffset = 0.00000000;
in this scenario it’s clear that putting the stuff outside of the branch is having zero effect, and the complex parts are getting processed regardless of branching it afterwards. this is the scenario has described before as “not really branching”
however things aren’t really much more clear because this seems to be a different scenario than the other 2 above: not a positive impact (like the full nested branching), and not a weird-negative-positive-negative impact (like your branching in my results)
you still haven’t showed a working example with a working comparison. and I mean a proper material that will branch things differently per pixel (all you had was with a scalar parameter which affects all pixels the same).
show a simple material, show something complex with no branching involved, and then show it with your branching. without the 3 cases compared it’s impossible to tell any difference between the different usages. so far all I see is some difference and “my computer is a toaster but trust me, it’s better”