Dec 6, 2020.Knowledge
What is generally considered too much for mobile devices in terms of texel density and draw calls, and what are the best techniques for optimizing “high detail” content for slower devices?
- We tend to start with high detail, and remove it later, by turning things up rather than down
- Where possible, it is beneficial to run the same test consistently (ex replays), or an equivalent repeatable setup
- This is an area where Apple devices tend to be more consistent with testing
- There are platform specific tools for each architecture (e.g. Mail/Snapdragon etc), but aren’t great for long replays
- We are hoping to get the real desktop renderer into mobile eventually, but it requires very high-end devices currently
- For 4.26 we have an experimental renderer for higher end devices (Android Vulkan only due to driver limitations)
What are the optional high-cost (performance-wise) features that most developers disable to help get the most out of the engine on a broad range of devices?
- Post-processing tends to be heavy
- Shadow quality
- Later on: AO/Screenspace effects for higher end devices
- We try not to have different shaders for lower versions, to cut back on package size
Currently we can only take advantage of instanced models in a limited way because they must be grouped in the editor. Is there a way to create instances at runtime?
- Some of the code is editor-only, but it could be changed to work at runtime, with some code changes on your side.
Which asset optimizations should we prioritize? LODs? Draw Calls?
- This really depends on your target hardware, but certainly draw calls can become a major hit in performance. Target 2-3k draw calls per frame on previous-gen consoles.
Any engine settings to trade in memory for a framerate increase?
- Cached shadow maps. May or may not be applicable depending on how often lights change
- If using distance field ambient occlusion, you can force a max size for the atlas to avoid hitches when it resizes
Best advice for performance on Switch?
- Manage expectations, be aggressive on resolution targets and keep a close eye on memory usage
- Use scalability features to scale your graphics to the device
- We use the forward renderer on Switch instead of deferred, saving us 1-1.5 ms at the cost of some features
Any suggestions for systems to measure performance regularly?
- Everything we use in FN is public. LLM assigns categories to allocations based on callstack, it is vital to use it to track memory usage
- We have a low overhead CSV stat writing system, which can dump data to be ingested by tools. You’ll need your own custom tooling for your game to run test scenarios and gather that data
- Gauntlet for automation, can run builds on different platforms. We use it overnight to compile, cook, run smoke tests, and send out charts
Are all the loading performance changes in 4.25?
- Yes, however there’s additional work done in the 4.25Plus branch.
How are performance improvements going with load times, streaming, actor spawning…?
- We haven’t focused on streaming specifically, however a number of changes will impact streaming performance.
- Our focus has been on raw throughput for loading assets in general.
- New Loader eliminating overhead of EDL’s bookkeeping and throughput limitations
- New Loader has “IO Dispatcher” which offers a more direct IO interface
- About 85% faster in Fortnite on some platforms. From 30s to 4.5s
- There’s actually many other elements in those 5s, not just raw serialization, which is only about 1-2 seconds of the 4.5 seconds (physics, spawning)
- Summary: It’s significantly improved, but specific gains depend on content; e.g. Infiltrator demo receives bigger gains than Fortnite.
Do load time performance improvements affect the previous gen?
- Yes. Actual results will depend on if you become IO bound by the hardware.
What are the optimum settings in performance for handling VR and non VR players?
- This is a broad question, however we have VR content recommendations here: Virtual Reality Best Practices | Unreal Engine Documentation
- The scalability settings can be used to tune content for VR vs non-VR at run-time. It would also be possible to manage the minimum LOD for “hero” meshes based on view. If you are also targeting PCs it might be possible to use the per-platform LOD feature (available for static and skinned meshes).
You may also want to set the Texture groups on a per platform basis. The mobile performance guide is shared here: Performance Guidelines for Mobile Devices | Unreal Engine Documentation
How do you determine what to focus on for optimizations?
- There are many diagnostic tools that can tell you where your performance hits are coming from.
- Stat Unit: this console command will show how much time is spent in the 3 main areas: Game, Draw(Renderthread) and GPU. The GPU is the usual bottleneck followed by Draw.
- UnrealInsights profiling is a deeper way to analyze the performance of your project
How do you test / set up your work for low end PCs?
- Ideal approach is to set up a test rig(s) that meets your specs for the low end.
What’s the best way to determine budgets for polys, textures, etc?
- It’s best to first understand your target devices. Based on their spec you can develop strategies around tailoring to avoid bottlenecks.