Android TaskGraph performance fluctuations

Hey team,

We’ve got a pretty large Android improvement to share.

To set some background, we’re building for mobile, and utilizing Chaos Physics in async mode (To make use of fixed timestep physics) - additionally, we’re utilizing the re-simulation replication mode, because it suits the needs of our title.

Async physics in Unreal Engine doesn’t run on a dedicated thread, but rather runs in the Taskgraph task pool threads.

We’ve seen some really inconsistent performance on Android, we narrowed it down to poor scheduling decisions within the OS, which can be resolved by integrating with the PerformanceHintManager (PerformanceHintManager | API reference | Android Developers).

The problem presents to different extremes on different devices, with Pixel devices being the worst case for us

In our testing we’ve been able to improve our frame rate stability on Android immensely by taking Googles ADPF plugin for the engine (ADPF Unreal Engine plugin | Android Developers), and modifying it to include the task graph threads in the game thread hint session.

That change drastically improved our frame rate variability by informing the OS of the performance critical nature of those threads.

Our implementation is quite rudimentary, we exposed a getter for the worker thread IDs from LowLevelTasks.FScheduler and includes those in the hint session along with the game thread.

TArray<uint32> FScheduler::GetWorkerThreadIds() const

{

TArray<uint32> WorkerThreadIds;

for (FThread* WorkerThread : WorkerThreads)

{

WorkerThreadIds.Add(WorkerThread->GetThreadId());

}

return WorkerThreadIds;

}

You can see the results in the attachment “Performance-comparison.png”

Some problems with this implementation include:

  • The scheduler has logic for restarting threads, and we’re not really sure yet of the lifetime expectations of those threads. We just grab them on app startup, so if they’re ever re-created, the newly created thread pool won’t be part of our game thread hint session.
  • This also includes both the foreground and background threads as performance critical threads, but background threads probably don’t need to be run on high speed cores (For better thermal performance) - however TaskGraph will schedule foreground work on background threads, so in the cases that happens, we actually do need the OS to run them on faster cores
    • It would probably be best on Android to have more explicit foreground and background work scheduling so that frame critical work doesn’t occur on background threads which aren’t included in the hint session, or perhaps the threads could dynamically be added to the hint session by the scheduler. Not really sure on the ideal implementation.

Additionally, when using re-simulation replication, we hit a frame rate spiral of doom on Android quite frequently because of the lowered core speeds.

We’ve been able to fix this locally by running our game side code on its own thread(s), with their own performance hint session, and decreasing the time requirement for those threads to complete in response to a resimulation beginning, this has resolved that spiral of doom, but does still incur a decent frame rate drop initially due to the time it takes for the OS to either ramp clock speeds, or change its scheduling behavior to accomodate the performance hint session adjustment

So, big wall of text out of the way, I’d very much like to see the scheduler, and fixed threads (game / render / RHI) integrate the performance hint manager “out of the box” in engine.

It would also be great if async physics could be accounted for in the implementation - while we have improved these problems for our title, it would be nice if these improvements could be made to the engine (So that everybody can share in better mobile performance from the engine, and so that we’re not maintaining these divergences forever)

Thanks!

[Attachment Removed]

Steps to Reproduce

  • Run expensive logic on Task Threads on Google Pixel Android devices
  • Enable stat raw
  • Frame rate is unstable
    [Attachment Removed]

Hi Darcy,

From the few conversations we’ve had with the dev team, the efficacy of ADPF appears to be quite specific and in many cases has little to no effect. However, this is something we’ll continue to monitor as your suggestion to include workers in the hint session may prove to have better yields than the exiting cases.

Best regards.

[Attachment Removed]

Hey [Content removed]

I was watching this performance talk from UnrealFest which just popped up on the Unreal Engine channel a few days ago, the scheduling issues that presenter is talking about in Fortnite at around minute 17 (https://youtu.be/cECce7rtogk?si=Q7WyB3kggLejuIm5&t=1016) is the sort of issue that the PerformanceHintManager (https://developer.android.com/ndk/reference/group/a-performance-hint) intends to resolve, that’s the important part of ADPF for us, the issue he’s describing where critical work was put on low cores is exactly what was causing us trouble, and integrating the PerformanceHintManager saw the frequency of this occuring drop drastically - as you can see in the screenshot of our before/after performance captures (Every spike on the top capture is caused by poor work scheduling by the OS)

Our current implementation actually doesn’t use the ADPF plugin, I wrote a small JNI wrapper around the PerformanceHintManager so that we could have control over the performance reporting

> We’ve been able to fix this locally by running our game side code on its own thread(s), with their own performance hint session, and decreasing the time requirement for those threads to complete in response to a resimulation beginning, this has resolved that spiral of doom, but does still incur a decent frame rate drop initially due to the time it takes for the OS to either ramp clock speeds, or change its scheduling behavior to accomodate the performance hint session adjustment

I did just notice in looking at the PerformanceHintManager docs that there’s a “notifyWorkloadSpike” function which I should be using (My current approach was just to report a lower duration requirement)

[Attachment Removed]

Thanks for the additional information Darcy. Have the improvements you’ve observed been specific to any device families or across a wider range of devices?

Best regards.

[Attachment Removed]

Hey,

We don’t have a very wide range of device families unfortunately, but we saw similar improvement in high workload scenarios on a couple of generations of Pixel devices, and a few generations of Samsung devices

Samsung devices were seemingly a bit better ‘out of the box’ than the pixel devices

I’ll attach some videos of a before an after in-action stat raw graph (I’ve clipped the footage to only the graph for confidentiality purposes) - I’ll attach it in 2 different posts because the page wont let me attach 2 at once

The general use case won’t be as bad as what we see in our case, we were performing async physics simulations in blueprint, the workload of these simulations was high, but stable (No major changes in our math / branching operations frame on frame), on iOS our performance variability was small:

  • Min: 0.12ms
  • Avg: 0.18ms
  • Max: 0.96ms

In contrast Android (Pixel 8a) was

  • Min: 0.14ms
  • Avg: 0.68ms
  • Max: 13.94ms

Similar min, higher average, but higher average is skewed massively by way higher maximum times

Those massive maximums came from poor scheduler decisions putting task graph threads on low priority threads (Or just clocked down threads) or running less important work ahead of them, using the PerformanceHintAPI hasn’t *entirely* resolved this (It’s still not great in the case of sudden workload increases - but as above, I need to try using notifyWorkloadSpike instead of just changing the thread duration requirement)

In a single player scenario with no workload spikes, we went from running 60FPS with periodic drops to 5FPS when the scheduler decided to do something weird, to a consistent 60FPS experience

[Attachment Removed]

Thanks for all the information Darcy, we will schedule further investigation and pursuing this path for a future UE release.

Best regards.

[Attachment Removed]

And the after video

[Attachment Removed]