Performance on Mali T860 and finding bottlenecks using Unreal Insights

I know, that fighting with Mali T860 GPU performance on a 5 years old Android device is somewhat academic, but a good way to learn profiling and I need to run a little game on this one.

Initially I ended up with 7.5fps with my little game and did get to 15-20fps now.

  • Changing occlusion to software occlusion and precalculated visibility just moved the bottleneck, no change in fps
  • Use of a custom device profile matching the Mali T8xx - reduced to android_low basically, while originally, this chip is categorized as android_mid
  • Build up level from scratch and add content step by step, checking performance
  • Reducing shader complexity in ES31 preview by reauthoring some meshes from quads to better fit the texture (e.g. trees) removes lot of overdraw but has almost no effect (1ms gain).

stat unit shows, that I am Draw Thread bound:

        100%	 25%   < r.screenpercentage
Frame   58       22
Game    17	     20
Draw    58	     22 
RHIT    18	     19
Mem	    425MB	 425MB
Draw    106      106
Prims   24.34k   24.34k

Lowering r.screenpercentage leads to far better values in stat unit (rough values here), so, following the docs, this means, that I’m bound by something pixel related:

  • memory bandwidth
  • math limits (ALU)
  • in rare cases, some specific units, e.g. MRT export

r.Shadow.MaxCSMResolution / r.Shadow.MaxResolution values are set to 512 in my custom device profile matching the Mali T8xx. Lowering this further does not change stat unit values,
but instead only leads to artifacts in lighting (inside of mouth of 3rd person player character getting lit).

From docs about r.Shadow.MaxResolution:

If changing the resolution does not matter much, you are likely bound by vertex processing (vertex shader, or tessellation) cost.

So this one now seems to tells me, that I am vertex bound - now I’m getting confused - that’s different from what changing r.screenpercentage does tell me.

Doing Unreal Insights traces show, that WaitUntilTasksComplete is making up most of the time, along with a bunch ProcessThreadUntilIdle calls in microseconds range…

Render Thread:

Game Thread

Most of the time CPU is waiting for something… It looks to me, like it is the GPU (or any unaccounted functions on CPU?)

Unfortunately:

  • there is no way to get any GPU insights data on the phone
  • I did not find a way to run RenderDoc on the mobile device
  • ARM Mobile Studio crashed my phone in a way, that even no power off was possible any more. (Fortunately the phone did reboot after several minutes by itself)

So there is something happening, but I obviously have no way to find out, what it is…

Note: Memory/Bandwidth limitations should not cause fps drops, as I learned in the Materials Master Learning Course at Epic Developer Community. I do observe hitches/freezes sometimes, which can be caused by such limitations. Is this still valid?
I definitely need to eliminate the hitches, that can take up to 1 second…

More options now with a try and error method:

  • reduce uvs (remove lightmap uvs after switching to dynamic lighting) (btw: static lighting does not behave better, the original was statically lit)
  • no tesselation is used, that I could try to turn off
  • reduce texture sizes - started with this, does not seem to make any difference
  • converting blueprint based actors, that only contain single mesh to pure static meshes (no effect, and should only affect CPU)
  • remove tick from any actors, that do not need them - no effect, also would cause cpu overhead only and is not causing my troubles - but is cleaner anyway

Any more options for me? At the moment I believe, that i will not get above the 15-20 fps I currently have on that device.

Any way to find out, what is happending during that WaitUntilTasksComplete timeframe? Might be, that I’m just at the end of the possible optimizations for this device.

BTW: on a Pixel4a of a friend it just runs at 60fps - no challenge here for tuning :slight_smile:

Did you resolve your problem ? I encounter the same issue in my VR Game, I have a lot of WaitForTasks in Game thread, RHIThread and RenderThread …

Hi @ReyhKt , I’m sorry for the late reply, have not been online quite some time in the forums. Unfortunately, I have not been able to find/get an answer to this.