Instanced Stereo Rendering increases GPU time up to 257%, why such a huge performance decrease?

When Epic announced Instanced Stereo Rendering I was quite impressed since it sounds awesome and I can need every bit of performance in my VR project, but now I actually profiled to see the difference, and now I’m not so impressed any more.

I see absolutely not benefit with using Instanced Stereo Rendering, actually, I see a decrease in performance.

So, what I have tested: Run the game in standalone (right click .uproject file and select launch), switch to VR with alt+enter. Then do “profilegpu”.

Screen Percentage = SP, Base Pass = BP

SP 10, BP 257% slower and the whole scene 44% slower.
SP 100, BP 19.8% slower and the whole scene 6.2% slower

The test scene is an average scene in my game, mostly the screen is filled with cheap opaque and masked materials. A small area is also translucent water, but really just a small area, not more than 5% of the screen.

There are 800,000 triangles drawn with 310 draw calls when ISR is enabled and 360 draw calls when ISR is disabled. It’s all just one actor with a few components, so the individual meshes are quite huge, they are all procedurally generated.

The “whole scene” difference is not really high since I waste a lot of performance with dynamic shadows, SSR and heavy postprocessing. The difference would be a lot more than 6.2% if I would not use half of the GPU time with Post Processing, that’s why I also did the test with a Screen Percentage of 10% since there Post Processing should be super cheap. And you see, 44% performance improvement with ISR disabled.

So maybe Instanced Stereo Rendering only improves CPU time? Well, no. It’s not really possible to see if it improves render thread time since the render thread is always capping to 11.1 ms (90 fps) in VR, so it’s impossible to really compare time between ISR enabled and disabled, but as far as I can see using stat startfile and stat stopfile the render thread is looking exactly same no matter if Instanced Stereo Rendering is enabled or disabled.

With ISR enabled, an empty material looks like this:

621ad86da3.png

With ISR disabled:

e91f9456d9.png

So ISR makes every single material have a 3 times higher vertex shader instruction count, it makes quite some sense that it’s a lot slower on the GPU then. The 257% performance decrease in Base Pass time when only the vertex shader really matters because of the low resolution seems to confirm this.

So what is the point of Instanced Stereo rendering? Does the scene have to fit some special requirements to profit from ISR? What are these requirements? How do I make my game profit from ISR?

I’m seeing similar results.

ISR OFF

SP 10, BP 0.12, Scene 1.71
SP 100, BP 0.24, Scene 3.94

ISR ON

SP 10, BP 0.16, Scene 1.61
SP 100, BP 0.29, Scene 4.33

(attached is a screenshot of the Statistics window for the scene)

Your test setup of only ~300-400 meshes will not really be enough draw calls to benefit from instanced stereo. would have to say for sure but I doubt you will see a benefit until you have around 1000 draw calls. IIRC Bullet Train had between ~750 and 1200 draw calls between the best and worst scenes and the instanced stereo was only saving alot in the most expensive scenes.

And the vertex shader cost is expected, it has to perform the additional math to move the instances around somewhere. That cost should be a fixed overhead though, not a 3x multiplier. That is why there will be a break even point that involves higher draw counts. Once the cost of the draw calls themselves gets really high, it becomes worth it to pay the overhead of the instanced stereo system since it will scale better with higher numbers.

Thanks for the answer !

It’s actually not 300-400 meshes, I only had less than 40 mesh sections on the screen (they are huge), most of the draw calls seems to be engine stuff, I guess that’s also why between ISR on and ISR off I only saw a 50 draw call difference.

So I will keep ISR disabled as long as I stay below 1000 draw calls, good to know! :cool:

Instanced stereo is a CPU optimization. It’s purpose is to reduce the render thread and graphics driver CPU cost when rendering in stereo. Without instanced stereo, we execute the entire render loop twice, once for each eye. This incurs both the cost of the render loop inside of UE4 and the driver overhead of issuing a separate draw call for each eye.

There’s a small amount of additional vertex shader overhead when using instanced stereo. We need to select the appropriate view uniforms based on which eye an instance represents and there’s an additional transform and clip required to ensure we render to the correct half of the frame buffer.

Depending on the scene and the GPU hardware being used, you may see a small speed up or slowdown on the GPU with instanced stereo enabled (~3 - 5%). Some hardware can become bottlenecked in primitive assembly which ends up costing us a few percent of GPU performance. If clipping isn’t a bottleneck, I tend to see a small increase in performance due to better cache/memory use since we drawing the same object twice.

A few notes based on your test. Like RyanB mentioned above, you don’t really have enough draw calls for ISR to make a difference. If you had a lot of other non-rendering CPU heavy work, it might help get the render thread off a CPU core sooner giving you a few more cycles for other things, but don’t expect major gains for simple scenes.

It’s somewhat difficult to profile CPU render thread cost in VR due to queue ahead / vsync. I recommend running your project with -emulatestereo. This will bypass the HMD plugin and just render to a stereo frame buffer which allows for fine grain profiling without the HMD compositor getting in the way. Make sure both frame smoothing and vsync are disabled and then use the stat unit console command. The draw metric is the CPU render thread time per frame. This is what ISR should improve. You should be seeing ~10-20% gains depending on your hardware and scene complexity.

It would be cool if there were a “Instanced Stereo Test” button that would allow the engine to test your scene in both Stereo and Non-Stereo mode then display a comparison. Would be even cooler if it would automatically enable or disable this feature depending if it helps or hurts your scene.

2 Likes