Optimizing Material

pickersZ · October 7, 2018, 7:55am

Hi, Is there a guide on optimizing the material by connecting nodes 'correctly" versus other ways which might be less performant?

For example pseudo : if ( AA && BB && CC … )
where AA is false, it would be exited immediately.

I have tried to optimize it based on my understanding of computer logic but only to have more instruction counts to it. Is there such thing as connecting nodes in a performant way in material? Or it has been optimized by unreal and we have not direct control over it? Thanks in advance.

anonymous_user_fbe2d247 · October 7, 2018, 12:43pm

Can you show real example that you are working with?

pickersZ · October 8, 2018, 2:16am

Hi Kalle_H, thanks for looking into this.

Here is my not the best example that i am showing here. Both are the same Clouds_MF with same instruction counts. I am trying to determine where should be the best location for the if statement(s) in order to optimize the material graph. The first graph is the obvious choice but puzzled me as instruction counts are the same or even more in some cases. The scenario for the graph is that it should stop the calculation of instructions when alpha mask is 0 since it is going to return 0 anyways. So what could be the best options in optimizing the graph? Thank you!

Deathrey · October 8, 2018, 4:23am

You would see change in instruction count only if alpha mask input is compile time constant.

Given this graph, replacing IF material expressions with multiplies and getting rid of clamp would be good.

pickersZ · October 8, 2018, 8:41am

Thanks. So that means that regardless of what i do for the graph, it doesnt matter if it is not going to be factor as compile ready material (similar situation as static bool). As material doesnt change its instruction counts during runtime based on the graph logic.

Deathrey · October 8, 2018, 8:48am

Correct.
What you initially expected from IF material expression is dynamic flow control. It is not achievable using material expressions and you would need to use custom node.

pickersZ · October 8, 2018, 9:18am

Thanks for the info. Really appreciate that.

Manoel.Neto · October 8, 2018, 3:48pm

Shaders don’t work like that. That instruction count is the size of the compiled program, not how many instructions are being executed based on the inputs. Also, dynamic flow control should be used with lots of care: it can actually hurt performance if misused.

GPUs contain a large number of “cores” (streaming processors) that execute shaders in parallel batches which are run in lockstep. This means that each shader instruction operates on the values of several pixels at once (32 on NVidia GPUs and 64 on AMD ones). When a dynamic branch is reached, if any of the threads in the batch diverge both sides of the branch will be executed for all threads with each one simply discarding the results of the untaken branch.

Also, since the values of untaken branches are not kept, you cannot call DDX and DDY instructions inside branches. This means you also cannot rely on automatic mip map selection when sampling textures inside branches since those rely on derivative calculation (GPUs will try their best to guess the correct values, but there may be artifacts). You have to either specify mip level manually or use derivative values calculated outside a branch.

pickersZ · October 8, 2018, 9:55pm

Manoel.Neto:

Shaders don’t work like that. That instruction count is the size of the compiled program, not how many instructions are being executed based on the inputs. Also, dynamic flow control should be used with lots of care: it can actually hurt performance if misused.

GPUs contain a large number of “cores” (streaming processors) that execute shaders in parallel batches which are run in lockstep. This means that each shader instruction operates on the values of several pixels at once (32 on NVidia GPUs and 64 on AMD ones). When a dynamic branch is reached, if any of the threads in the batch diverge both sides of the branch will be executed for all threads with each one simply discarding the results of the untaken branch.

Also, since the values of untaken branches are not kept, you cannot call DDX and DDY instructions inside branches. This means you also cannot rely on automatic mip map selection when sampling textures inside branches since those rely on derivative calculation (GPUs will try their best to guess the correct values, but there may be artifacts). You have to either specify mip level manually or use derivative values calculated outside a branch.

Thanks for the info. Will take note of the branching issue that you have mentioned.

link on what Manoel.Neto has mentioned for branching on derivative calculation,
http://www.aclockworkberry.com/shader-derivative-functions/

FrankieV · October 9, 2018, 2:01am

Whats the application?

Under what condition would you want to decrease the instruction count?

As a thought you could optimize a material as to shader complexity by inserting a modified material into a LOD material slot assuming that you wish to decrease the instructions, draw calls, based on screen percentage.

pickersZ · October 9, 2018, 9:55am

Hi,

It could be any application.
for the simplest, it could be as simple as bool where the inputs are either 0 or 1 in which the conditions might not be just be distance based.
Thanks. It was something new to me. I didnt know about the feature as i will have single material for all static meshes and the material assignment for LOD is always greyed out. But wont that increase the drawcall due to the increase of materials? For example : If i have three LODs, i would need three material slots for the static mesh. I would probably run some simple tests to check on the performance.

Deathrey · October 9, 2018, 10:13am

Nope, changing materials for each LOD won’t increase the draw call count, but it still incurs some overhead. Deciding if you should or should not LOD the shader is very situational. Naturally, the further the object is, the lower vertex count its LOD would have and the less pixels it will occupy and consequently will render faster.

As for branching, I used to have a fossil thread about it.

pickersZ · October 9, 2018, 11:10am

The test results for the LOD materials didnt fare well. There was a noticeable drop in FPS when using two material LOD versus one material. Thanks for the link, i will have a read on that.