Performance: Should I use additive over translucent blend mode?

ThiloN1987 · June 8, 2020, 1:25pm

Hello,

is the additive blend mode cheaper than the translucent blend mode?

Thank you,

Thilo

jwatte · June 8, 2020, 8:01pm

In the graphics hardware, there’s no difference.

Some optimizations can be done for additive, such as not having to sort by view order, so depending on the specifics of your effect, such optimizations may or may not matter.

ThiloN1987 · June 8, 2020, 11:50pm

Why is there no difference in the graphics hardware? I mean, the UE documentation states that with additive blend mode the final color is final color = source color + dest color. 1 addition. With the translucent blend mode it is source color * opacity + dest color * (1-opacity). 1 multiplication, 2 additions (?, I don’t know how to count the operations the correct way, but there are for sure more operations than with the additive mode). As I understand it, translucent blend mode should be more expensive.

presto423 · June 9, 2020, 7:34am

The different blend modes have different calculations performed in the backend too, so the math referenced isn’t the only aspect of how those are calculated and rendered too. What’s to worry about? It’s usually a difference of milliseconds that is not enough to cause lag, hitching, or major graphics problems. If that’s a problem, then there’s more learning you need to do to about the processing and render pipeline or other features to get it to as fast as possible in rendering.

jwatte · June 9, 2020, 6:28pm

A “millisecond” is huge for a frame

Individual pixels really count in nanoseconds, and because of parallelism in the hardware, it comes down to picoseconds-per-pixel if divided out!

That being said, graphics hardware generally has special hardware for the blend operations, because they are on the write path, and have to interact tightly with the graphics memory directly, because of the defined ordering semantics of multiple overlapping polygons. This means that the “blend hardware” (which may look different on different architectures, and even on different generations of the same architecture) generally is designed to support all the common blend functions at full throughput. (I’m not a hardware engineer, and I haven’t looked in detail at the very latest hardware generation from each vendor, but this has generally always been the case.)

So, conclusion: Don’t worry about the blend function; worry what it looks like, and worry about upstream operations.

ThiloN1987 · June 9, 2020, 11:36pm

jwatte:

A “millisecond” is huge for a frame

Individual pixels really count in nanoseconds, and because of parallelism in the hardware, it comes down to picoseconds-per-pixel if divided out!

That being said, graphics hardware generally has special hardware for the blend operations, because they are on the write path, and have to interact tightly with the graphics memory directly, because of the defined ordering semantics of multiple overlapping polygons. This means that the “blend hardware” (which may look different on different architectures, and even on different generations of the same architecture) generally is designed to support all the common blend functions at full throughput. (I’m not a hardware engineer, and I haven’t looked in detail at the very latest hardware generation from each vendor, but this has generally always been the case.)

So, conclusion: Don’t worry about the blend function; worry what it looks like, and worry about upstream operations.

I am a bit confused. You say that I shouldn’t worry about the blend mode, but everyone says that the translucent blend mode is expensive and especially in VR you should avoid it and use the masked mode or DitheredTemporalAA (masked fake-translucency) instead.

jwatte · June 11, 2020, 6:03pm

The original question was about “additive” versus “translucent” blend modes. Both of those modes require reading from the framebuffer, performing math (which is generally fixed-function,) and write-back to the framebuffer. Further, because the math may be order dependent, the renderer cannot hoist the render order of triangles around the pipeline. (This was huge on some mobile tile renderers; don’t know how much this still is a matter on the latest hardware versions.)

Now you’re suggesting screen-door transparency instead, which is a totally different mode. Screen-door effectively treats the “blended” pixel fragment as either on or off, and writes it or doesn’t write it. Note that this doesn’t require any read-back from the framebuffer to composite the value. However, framebuffer interfaces may work like real cache lines these days, which means that you pay the read-back penalty anyway unless you overwrite the entire “cache line” (which typically is a pixel block of some size — between 2x2 and 4x8 depending on hardware.) And then you have to consider early Z effects as well.

All of these are quite fiddly bits that are highly situational. On some kinds of hardware, for some scenes, temporal or screen door transparency can be a win. On other hardware, or for other scenes, it may do nothing but make your art look bad. If you are not fill rate limited, then it’s unlikely to matter at all – if you spend 300 cycles shading a pixel, that is usually way more than needed to hide the additional latency of blend modes.

It doesn’t help that each new hardware generation does this differently, and changes the rules even within a single vendor. The only way to know for sure whether it’s worth it, is to carefully benchmark your particular art and scene on your particular target hardware with your particular circumstances.

presto423 · June 11, 2020, 6:28pm

That’s quite an in-depth amount of information involved there, @jwatte. I think OP wasn’t directing efforts at comprehending the nuances and intricacies of hardware-specific capabilities, but simply trying to see what works in the engine for the situation. How I see it is, if additive simply adds to the current scene color, then it’s limited more in context than translucent blend mode, which is also subtractive and requires more memory or processing probably because it takes other graphics settings into account more. Like shadowing and sort priority, depending on where the translucent mode object(s) are in the scene, as well as the other material settings for those.

jwatte · June 11, 2020, 10:46pm

I don’t believe that’s true. My argument is that that’s not actually a real difference, at least not hardware-wise.
Use whatever looks best for your scene! If it runs too slow, it’s probably the shaders or the size of your billboards that matter, not the specific blend mode.
Btw, generally “additive” is done as “pre-multiplied,” where you get to add the color channel directly, and the background is subtracted by the alpha, which is a much more flexible blending mode, as it allows you to do pure light, pure smoke, and anything inbetween.