Instructions count is incorrect for multiple multiplications

Andrej730 · November 19, 2023, 1:18pm

Noticed that making multiple multiplications in a row adds just 1 instruction to the instruction count just as if there would be just 1 multiplication.

Testing I’ve made sure that different scalar parameters are multiplied, so compiler won’t be able to optimize it and will actually do N multiplications.

Is it a bug? Any explanation for the instruction count in the shader graph works and how reliable it is and in what cases? Does shader complexity view mode in viewport use the same data?

No multiplications, default shader:

1 multiplication (+1 instruction):

~50 multiplications of different parameters:

BananableOffense · November 19, 2023, 4:23pm

“Constant folding is an optimization that Unreal Engine employs under the hood to reduce shader instruction count when necessary. For example, an expression chain of Sin >Mul by parameter > Add to something can and will be collapsed by Unreal Engine into a single instruction, the final add”

glitchered · November 19, 2023, 7:34pm

that makes sense too. all parameters are 0 default. gets nuked by the compiler. parameters are usually constant buffer things tho, not in code constants. if all those parameters were populated with dynamic data it should not fold. @ OP you can test that.

BananableOffense · November 19, 2023, 8:15pm

Agreed, if they are dynamic then they can’t be skipped by this specific type of optimization. In which case some other type of optimization may be running if they still don’t count.
Long story short, usually the instruction count isn’t wrong - it just may not be intuitive as to what the compiler is doing in the background to cut out unnecessary instructions.

Furthermore OP, keep in mind that instruction count does not equal GPU cycles.
Some nodes, like trig functions or texture samples may take dozens or even hundreds of GPU cycles but only show a small number of instructions. So always take it with a grain of salt. The only trustworthy metric is actual performance profiling.

glitchered · November 19, 2023, 8:42pm

on that note. i wish there was a native mad instruction node. the literal fused multiply add instruction. gpus have that and it’s useful for a bunch of things, but the multiplyadd node does something else i don’t need.

4 instructions that could be 3

Andrej730 · November 20, 2023, 6:35am

Constant folding is an optimization that Unreal Engine employs under the hood to reduce shader instruction count when necessary. For example, an expression chain of Sin >Mul by parameter > Add to something can and will be collapsed by Unreal Engine into a single instruction, the final add.

From what I’ve tested with shader compilers - 1 multiplication will be 1 instruction atleast (if it’s just a scalar) and since those scalar parameters are dynamic (not just constants), compiler won’t be able to optimize them and reduce to the single command. Even if it’s 0 by default - compiler can’t rely on default value and since it can be changed anytime to anything.
So it’s either instruction count is wrong or there is some magic goes under the hood that I haven’t meet yet and can’t reproduce with shader compilers. If anyone has a clue what’s going on - I think that might be a really valuable information knowing that it actually does optimization that can be helpful to consider when it actually will work to make more performant shaders.

Furthermore OP, keep in mind that instruction count does not equal GPU cycles.

Yeah, that makes sense. Btw there is a trick with texture samplers - they considered to be time consuming which is true but compiler is smart in that way and while waiting for texture sample to finish it will do other shader math in parallel. Which is different with how it would work with other operations like arctan - it won’t be able to do anything else until current instruction is finished.

on that note. i wish there was a native mad instruction node. the literal fused multiply add instruction. gpus have that and it’s useful for a bunch of things, but the multiplyadd node does something else i don’t need.

In theory compiler should optimize it to native mad if it’s present on your gpu anyway, so there shouldn’t be a need for specific green node for this operation.

But that’s part of another topic I find interesting - do material functions add overhead and prevent compilers from optimizations or it works exactly as if material function graph would be of the original graph?
It’s a bit related to how Unreal explicitly states that custom nodes don’t support constant folding (I guess as other compiler optimization) - but I don’t get it why, it’s the same HLSL code under hood attached to the main shader, what could prevent it’s optimization?

Andrej730 · November 20, 2023, 10:55am

I think I’ve found the explanation to this by exploring generated HLSL code.

difference between 50 multiplications and 0 multiplication is not present in the instruction count since it’s calculated outside the shader (presumably on CPU, which is very clever if compiler can actually do that) and passed to shader as already calculated value with Material.PreshaderBuffer.

1 instructions difference present can be caused not by actual multiplication instructions but by just connecting anything to the pin which makes some other parts of the shader active.

Though there is still a question when compiler is able to pass parts of the code to be calculated at PreshaderBuffer and how we can use it to our advantage writing shaders.

BananableOffense · November 20, 2023, 3:09pm

That would make sense. Non Material Instance Dynamic Params don’t change at runtime but can differ from instance to instance. Precomputing something like that to a buffer to store a value per material instance would allow everything to get skipped in exchange for a tiny amount of memory.

glitchered · November 20, 2023, 3:41pm

yep. looks like constant buffer input gets folded per shader/material. there’s logically no need to do this parameter code for every pixel. as soon as you plug a texture or any resource that is gpu local in a slot it will unfold from there, tho.

system · December 20, 2023, 3:42pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.