Very long compile times with ThinLTO on mobile platforms

We’re seeing very high link times with ThinLTO enabled on mobile platforms.

It takes more than 2 hours on Android and more than 1 hour on iOS.

Here is the exact times for Android: (Wall: 9859.11s CPU: 18093.14s)

Without ThinLTO, the same machine links in a couple of minutes: (Wall: 21.42s CPU: 95.50s)

Is it expected behavior? If yes, I’m curious about when you are using LTO on Fortnite.

Steps to Reproduce
Run UBT with Test configuration with -LTCG -ThinLTO -AllCores with iOS or Android platform.

Hi Jouan,

This is expected as LTO is a slow and heavy process operating on a large amount of code. Specifying a thin LOT cache directory may improve performance on subsequent runs, however, Fortnite doesn’t use of it in part due to performance and rather, uses PGO.

Best regards.

Do you use PGO on all platforms or only on mobiles?

LTO is not prohibitively lengthy on consoles and Win64 (about 20 mins for us).

I heard through the grapevine that this is fixed in 5.7 thanks to UBA redirecting heap allocation to its own allocator.

Apparently, the slow down is caused by thread heap allocation contention with the Windows heap allocator?

Hi Jouan,

It is my understanding that UBA has always had memory detours on Windows hosts. However, this does not exist on POSIX platforms (Mac/Linux).

Best regards.

Also, Stephane, could you please confirm whether Fortnite uses LTO or PGO on iOS and Android?

I wouldn’t expect PGO to significantly change the timing here as it is still very expensive whole program optimization.

Hi Daniele,

PGO is used on both iOS and Android for Fortnite. I do not however have metrics as to its level of effectiveness.

Best regards.

It’s difficult to give exact figures on full build times with PGO enabled however, it should be significantly faster than LTO, at least under our internal usage patters.

Best regards

Hi Jouan,

There unfortunately isn’t official documentation however, here’s a quick step-by-step guide:

Instrumented (PGO Profile) build:

In your TargetRules:

public class MyGameClientTarget : TargetRules
{
    public MyGameClientTarget(TargetInfo Target) : base(Target)
    {
        Type = TargetType.Client;
        // For Android
        bPGOProfile = true;
    }
}
  • Package and install to device.
  • Run the app with a command line that includes where to write profiles:
  • -pgoprofileoutput=“/sdcard/Android/data/<your.package>/files/PGO”
    • Exercise representative workloads (menus, matchmaking, gameplay loops) to collect meaningful coverage.
  • Ensure clean shutdown to trigger PGO_WriteFile()

UE will emit multiple numbered .profraw files (PGO_WriteFile increments a counter).

Merge profraw files:

  • Pull the .profraw files from device.
  • Merge:

llvm-profdata merge /path/to/*.profraw -output /path/to/profile.profdata* Place the .profdata file where UBT expects it, e.g.:

<Project>/Platforms/Android/Build/PGO/MyGameClient-Android.profdata

  • or set the PGODirectory/PgoFilenamePrefix accordingly.

Optimized (PGO Optimize) build:

  • In TargetRules:
bPGOOptimize = true;
bPGOProfile = false; // now using data rather than generating it
  • UBT will add -fprofile-use. Compile/link will weigh hot vs cold paths using the profile data

Let us know if you encounter any issues.

Best regards.

We’re compiling Android on Windows and it still takes more than 2 hours. We’re not using UBA though.

Are you saying that UBA would help here even on 5.6?

Can you share the link duration for your mobile PGO builds?

I’m trying to understand if it’s faster than LTO.

Do you have documentation covering how to make a build instrumented for PGO and how to build with PGO once the profile is generated?

I couldn’t find anything besides a couple argument mentions in this doc: https://dev.epicgames.com/documentation/en\-us/unreal\-engine/build\-configuration\-for\-unreal\-engine