jwatte:
A “millisecond” is huge for a frame
Individual pixels really count in nanoseconds, and because of parallelism in the hardware, it comes down to picoseconds-per-pixel if divided out!
That being said, graphics hardware generally has special hardware for the blend operations, because they are on the write path, and have to interact tightly with the graphics memory directly, because of the defined ordering semantics of multiple overlapping polygons. This means that the “blend hardware” (which may look different on different architectures, and even on different generations of the same architecture) generally is designed to support all the common blend functions at full throughput. (I’m not a hardware engineer, and I haven’t looked in detail at the very latest hardware generation from each vendor, but this has generally always been the case.)
So, conclusion: Don’t worry about the blend function; worry what it looks like, and worry about upstream operations.
I am a bit confused. You say that I shouldn’t worry about the blend mode, but everyone says that the translucent blend mode is expensive and especially in VR you should avoid it and use the masked mode or DitheredTemporalAA (masked fake-translucency) instead.