Large Performance Regression in UE5 (CPU Performance)

Some extra findings for today.

I checked how UE4 VS UE5 compare on DX11, and UE5 actually outperforms UE4 (without the translucency 0 command). With this command UE4 shows the same performance as UE5. So it looks like the fps differential is related to how the DX12 RHI was changed in respect to UE4. Also worth to point out DX11 outperforms DX12 atleast in my barebones scene by quite alot (700vs 1000fps). I tried vulkan aswell but its worse.

2 Likes

I think I can pretty much wrap up this topic from my last findings, which are the following:

There is a set of things that are inherently slower in UE5 compared to UE4, which are the following:

Material System

UE5 updated its rendered code to use SM6 as a base, moving to more complex PSOs also In relation to Lumen and Nanite. Even when using SM5 you can see that shader instructions are higher for bare bones materials vs UE4. This is the most important performance regression as it increases memory usage and gpu time. Sadly there is no way to fully bypass this even when using the legacy renderer for mobile devices.

Animation Blueprints and Skeletal Meshes

AnimBlueprints in UE5 are slower because of extra processing logic for state machines/ event graph evaluations/ warping / root motion and blending. This can be bypassed if not using the AnimBp/Anim class

Also, the Skeletal Mesh Component became more complex with more advanced bone handling, this is also a very important performance regression as Skeletal Meshes are the most cpu consuming component.

Despite these setbacks, UE5 does have an advantage for the AnimBp class, which is the multithreading framework(have gotten used to c++ implementations of it) and it does help for complex tick logic such as Motion Matching updates. Granted it’s possible to implement custom multithreading for the AnimClass in UE4 but likely at a slighter overhead due to writing back to the game thread Update.

UI Widgets

UI Widgets on UE5 have more processing overhead due to higher fidelity rendering, which was specially done to improve text and vector scaling (it applies to widgets components in general) and more advanced check for widget transformations because of more complex hierarchy logic.

This one is more relevant that it could seem since Widgets were already pretty costly on UE4. This has a noticeable impact on console such as PS4/Switch.

One could get away by just using its own UI code and using textures directly.

Chaos

Even compared with UE4 Chaos, UE5 Chaos is less performant because it has extra ticks and some accuracy improvements. Of course, PhysX was way more performant and featured a Fast Path for mobile platforms. Thankfully at least there is Havok for Unreal Engine (paid) and also some custom versions of UE5 using updated versions of PhysX.

Niagara

This one is not a fulldowngrade, because the Niagara VM (Vector) did got a speed up improvement, however as previously mentioned here, the UE5 Niagara has heavier tick systems and does more operations to handle the emitters. In respect to cascade, Niagara often is less performant in older systems with weaker gpus such as Switch, phones or even 8th gen consoles.

Conclusion<

If the goal is to ship a game for phones (even iphone 16) or consoles such as Switch, Switch 2, PS4/Xbone there are great reasons to stick with UE4 since performance for these devices can make or break a release.

2 Likes

I want to quickly summarize a topic that I ignored to a degree. This in respect to how Chaos affects general traces and collision handling.

I set up a simple test scenario where I was looking at the cost of the CMC (for its computation in traces and sweeps) on 4.27 PhysX and 4.27 Chaos, this before comparing against UE5.

Results goes as follow, per CharacterMovementComponent when moving:

PhysX  4.27       0.065ms 
Chaos 4.27        0.09ms 
UE      5.4/5.5     0.10ms

While these numbers may look similar, isn’t really the case as Chaos 4.27 is 29% slower than it’s PhysX counterpart. This talks a lot about the general performance regression we get from the physics, not for simulation, but for general traces, queries, sweeps,etc.

While 5.5/5.4 is at 35% slower than 4.27 PhysX, this small extra overhead should be coming for the Larger Unit values (so there’s is an answer to the LargeWorldUnit performance question). At the same time it doesn’t look like there has been a meaningful performance upgrade on Chaos since back in the UE4 stage for that type of logic.


After all of these findings I find myself more confident in using Unreal 4 for some projects. In fact at a point I was working in a online shooter title on 4.27 with RTXGI(for some reason RTXGI was broken in UE5 and had performance issues).

I think UE4 is a viable and sensible option that is available, and it should be given appropriate consideration when taking a decision as it’s truly different from UE5, not just an outdated version of something newer.

2 Likes

Amazing Legwork!

Just wanted to confirm your findings for a sample of 350 frames with Niagra.
Here are some other suspect items.

You don’t happen to have your solution for PhysicsFieldComponent and Camera tick stuff do you?
trying to track down that camera one.

350 frames of gamplay Total Time Self time Average Time Occurances
UE4.27
FNiagaraWorldManager::Tick :mag: 1.370 ms 1.351 ms 0.001 ms 2185
UE5.3
FNiagaraWorldManager::Tick :mag: 4.146 ms 4.124 ms 0.002 ms 2352
NiagaraPumpBatcher :mag: 0.221 ms 0.216 ms 0.001 ms 350
FUpdateSpeedTreeWindCommand :mag: 0.286 ms 0.286 ms 0.001 ms 350
UMovieSceneEntitySystemLinker::LinkRelevantSystems :mag: 2.102 ms 0.016 ms 0.014 ms 152
FMediaPlayerFacade::TickFetch :mag: 0.442 ms 0.289 ms 0.000 ms 1400
UWorld::UpdateLevelStreaming :mag: 0.233 ms 0.233 ms 0.001 ms 350
FWebBrowserSingleton::Tick :mag: 2.196 ms 0.175 ms 0.006 ms 350

1 Like

Nice findings! The camera is a very simple one, just go to the camera class and deactivate it’s default comp tick (it will still work). Although I don’t quite remember for what the tick was being used for.

Whenever the 5.6 branch is published on github I will make a build and starting toying with those tings (should be in about some days from now)

Only my 5700X3D with RX 6800 can keep up with recent UE5 games like STALKER 2 & Silent Hill 2.

Been screaming these findings from the rooftops forever now! Even today I run DX11 SM5 SWRT Lumen instead of DX12 to hit steamdeck with GI in a HLOD workflow vs Nanite. Its a good tradeoff but can’t touch the perf I get from UE4.

1 Like

I have been following closely pushes to Main. 5.6 has been delayed a bit (I was expecting for the Beta to release at around GDC time as usual). x.com/theredpix → Been posting several pushes related to non-trivial optimizations. Still waiting for the 5.6 branch to be made public on GitHub to compile it and test it myself.


One of them

2 Likes