[Gear VR] Draw Call Costs and Performance Testing

rje · March 25, 2016, 7:56pm

I’m starting to investigate using UE4 for GearVR, and as part of that I did some performance testing that I thought I’d share since it may help others.

When I first started I just threw a scene together without really thinking of costs, I just wanted to get something on device so I could start testing. Very quickly I noticed that our Draw time (cpu render thread) was way too high, which would seem to indicate a draw call issue. So I took a step back and started a more rigorous performance test with the following characteristics:

UE4 version: 4.11p8
Device: Galaxy S6, consumer GearVR headset
Packaging set to make shipping builds
ASTC Compression for textures
All objects in scene are static
Postprocessing volume to disable all postprocessing effects
Lightmass importance volume around objects
Lightmaps provide all lighting via 1 static point light
No skysphere or backdrops, plain black background
All actors are basic cube with Basic Shape Material applied

From there, I started with 10 static cubes in the scene, and increased it in increments of 10 for each subsequent test. I then recorded the average Game and Draw thread times from the info displayed by ‘stat unit’. I’ve excluded the game time result from the table below because they didn’t vary significantly as the actor count changed (it was in the 2-2.5ms range for all tests).

Results:
[table=“width: 500, align: left”]

# Actors
Draw (ms)


10
2.8


20
3.5


30
4.5


40
5.5


50
6.4


60
7.3


70
8.4


80
8.8


90
9.1


100
9.8


110
10.2


120
11.0

Out of curiosity after I finished this test I used the experimental ‘merge actors’ tool to merge all of the 120 cubes into a single mesh. Draw call time for that was ~1.8ms.

So, I had a few takeaways and questions after running this test:

First and foremost, does this look roughly like the expected draw call time required? Are there any changes I should be making to my test that would have a material effect on overall draw call time?

I know the GearVR best practices doc says don’t exceed 100 draw calls in any view, but if 100 draw calls is going to take ~10ms of time I feel like I need to aim much lower. Each game will be different in terms of needs and goals, but if these numbers are representative of performance I feel like we should probably be aiming for closer to 50 draw calls max in a view, so there’s time left for cpu/gpu.

Please send me any suggestions/tips/ideas you might have. I’m happy to re-run all the tests and update the thread with new information!

Xenogenik · March 25, 2016, 9:09pm

That looks about right to me, from my (limited) experience. How you go forward with improving that all depends on your scenario. For example if you are creating lots of static meshes that are the same then you might like to look at instancing. This would at least improve the results of your test. Your S6 should support that. My Galaxy Note 3 does with the 4.11 preview.

aussieburger · March 26, 2016, 12:06pm

As Xenogenik said: instancing that is now in 4.11 can really help out!

In our project We Come In Peace… with 4.11 we were able to keep draw calls to around 80 with each wave of UFOs being one draw call (6 in a wave). Another tip to reduce draw calls at least with any menus / UIs is to use 3D widgets - they are also only 1 draw call

rje · March 26, 2016, 3:14pm

Thanks for the tips!

So it sounds like these tests are consistent with what you’re seeing in terms of draw call costs? That was mostly what I was looking for out of the test - having some more granular information will allow me (and hopefully others) to budget draw calls appropriately and know roughly what they need to be aiming for.

I’m planning on researching the options for reducing draw calls in a scene next - sounds like I should start with static mesh instancing

anonymous_user_c4ba8ff91 · March 26, 2016, 4:19pm

I have a question, if u don’t mind me asking.
Instancing seems to be viable only inside a blueprint, and there’s no direct way to put instances inside the scene.
In this case, would it make sense to have a single instance inside the blueprint and place copies of the bluprint inside the scene, or would this still create a separate draw call for every occurrence of the blueprint in the scene?

aussieburger · March 26, 2016, 9:46pm

I have not tested a single instance in a blueprint and placing copies of a blueprint around a scene but I’m guessing that results in separate draw calls.
The single blueprint with many instances inside it is the way to go, even though you don’t have as much control over each instance as a fully fledge actor - so you need to do lots of indexing to find which instance is which - but it is possible to use each instance in a scene how you want it in most cases. Each instance has an update transform node which can be used either in relative or world space. Hope that helps? Side note for collision detection: For some reason it was not working as expected for me when moving the instances around so I had to spawn a hidden collision box around each instance which also did not cost any draw calls -> got this idea from a post in an answerhub post somewhere.

anonymous_user_c4ba8ff91 · March 26, 2016, 11:14pm

This is kinda disappointing, to be honest. It’s sure great for placing objects procedurally, but for something other than that it’s a pain.
Although it’s easy enough to create a bunch of instances in the Instanced Static Mesh Component array and move them around by hand, managing those is quite a hassle. Even getting rid of a specific instance requires series of delete/ctrl+Z since there’s no obvious way of telling which instance is which element in the array.

rje · March 30, 2016, 5:31pm

Okay - so after a few days no one’s offered up any obvious corrections to my test, so I’m moving on with the assumption that those are the draw call costs we need to work with.

So the next step is to focus on techniques for reducing draw calls in UE4. Here are the things I can think of, are there any techniques that people feel like I’m missing?

Instance static mesh component
Merge actors tool
Manually merging meshes and atlasing textures
Precomputed visibility to reduce objects drawn

anonymous_user_c4ba8ff91 · March 30, 2016, 7:00pm

Maybe stating the obvious here, but I think the draw costs will also be affected by the complexity of the geometry & materials, too? Maybe first try with some actual assets before moving on to the next level?

rje · March 30, 2016, 7:44pm

I didn’t mention it in my original post but I did iterate over several material/geometry variants to see if they had a major effect on draw calls. Material iterations included varying texture/no texture, normal map/no normal map, metallic/roughness vs fully rough no metallic, and enabling/disabling lightmap directionality. On the mesh side I simply used more complicated meshes.

The big changes in cost there were on the gpu side, and not the draw thread side. The biggest delta I saw in the draw thread side was toggling the ‘fully rough’ setting in a material, it seemed like I gained around 0.25-1ms on the biggest test cases. But the number was small enough that I’d want to construct a specific test there to say that there’s an actual gain on the draw thread.