Greetings! So the story goes as follows. I’m developing a game that’s going to release on Switch/PS4 so the team has been working on improving the game’s performance as much as possible. For this post I’m going to focus mainly on CPU Performance
We moved from UE 4.25 to 5.3 3 months ago, and since some weeks ago we are working on 5.4. So to start the optimizations that we have done in these 3 months are massive. All the logic was moved to C++, lots of branchless code, just with the exception of a plugin (Fluid Ninja Live), I optimized engine settings, activated pso precaching, optimized collisions, removed physics as much as possible, using widget invalidation, optimized all Skeletal Meshes in the game (which resulted in big performance gains), all animation logic moved to c++ and several other optimizations. Deactivated many unused engine plugins. Also moved to 5.4( 12/13/23 ) using the clang compiler which outperformed MSVC consistently across different scenarios and projects.
5.4 msvc
5.4 clang
In short, we have been very thorough in optimizing the game specially for the cpu. As for game logic, only way to get more performance at this point would be to move to ECS, but this is not why I’m making this post.
Comparing 4.25 to 5.3 Directly
The reason I’m making this post is because when we moved from 4.25 to 5.3 the game got a huge performance regression. See here (Devs Builds)
4.25 (also tested 4.27 which was somewhat slower but still way faster than 5.3/5.4)
5.3 (just after moving the project from 4.25)
As you can see world tick / game is 2X faster on 4.25!
And no, there’s no error in these results. We have been very careful with maintaining the same settings(I’m aware project settings change when changing ue version)/verified actor count/ OF COURSE lumen /nanite is not on, both dx 12, and yes I did test shipping, dev, etc. These are real results, don’t doubt our findings, I’ve simplified the presentation of the differences, but they absolutely remain accurate to reality. We know what we are doing.
In an attempt to investigate this further I made 4.27 projects and then made a 5.3 copy of the same project. I did this with graphic intensive scenes and also with barebones projects. Unreal 4 was outperforming UE5 as long virtual shadows or UE5 specific features were not being used (UE5 can outperform UE4 in several cases when using VSM for example). My last attempt at figuring this out was to simply make a empty scene , and start adding stuff just to see when I could spot a deviation.
To my surprise there was no point in which UE5 started being slower than UE4, because by comparing an empty scene UE4 was twice as fast as UE5. And this was escalating with pretty much anything (on world tick/game ms)!.
My initial speculation that either the skeletal meshes, niagara or particle systems were inherently more expensive on UE5. But I was absolutely surprised after testing many scenes just to find that UE4.25 remained 2x to 1.6x faster than UE5, no matter if it was an empty scene or a scene from my game.
Also ram and vram usage is higher on UE5.
4.25
5.3
Well, now I will show how much better the current version of the game is running on 5.4 clang
Got world tick down from 7.04ms to 4.29ms , which is very significant and I’m happy we managed to get it this fast. But the 4.25 version is still faster in every significant aspect, having world tick at 3.42ms and running at 180fps when 5.4 is running at 120. A huge 60fps difference !
--------------------------------------------------
Of course, if we apply the same optimizations from our 5.4 build in 4.25 game would be even more performant. I estimate it would make 4.25 run at 1.9ms - 2.1ms. This performance difference is so big that we are considering backporting the project back to 4.25 for the game to perform good on Switch and PS4.
We want to avoid this of course, so I’m trying to figure out what’s making UE5 perform so much worse than UE4, and figure how to gain some performance back.
Here a list of performance harming suspects
-Game units being doubles in UE5:
Would like to know if this could be part of the reason, I wouldn’t think that it would make THAT much of a difference, but Transform/Render Data and component transforms could be much more expensive because of this)
-Render Thread Somehow being much more expensive in UE5:
My data shows this clearly, the render thread is heavier on UE5. I wonder why this could be, maybe it’s because of nanite/lumen stuff being part of the rhi even though deactivated. Maybe UE5’s shader’s PSOs are even higher in quantity compared to UE4. Honestly I don’t have much of a clue here, only someone experienced with the RHI could know what’s going on.
-Chaos:
The project is pretty much just using chaos on it’s collisions queries. But when profiling I did find out that collision cost was higher on UE5(chaos). We know Chaos is less performant that PhysX so it could be the base cost of Chaos (even though project is not even using ragdolls and almost no actors with gravity). Also Chaos has a lot of optimizations only available for the ISPC compiler, however I don’t even know if it’s properly supported in 5.4. See.
Here the results from an empty black scene:
4.27 (it was actually stable at 0.10ms but the screenshot slowed it down to 0.12ms)
5.3 (it was actually stable at 0.15ms but the screenshot slowed it down to 0.16ms)
The only actor in this scene is the basic game Mode stuff, the stat numbers almost didn’t change at all, just were modified slightly by the screenshot. As I said before even from this point we have a good representation of how performance compares. UE4 being from 1.5x-2x faster than ue5 (with 4.25 being faster than 4.27)
To wrap up this post, I just want to say that it would be great if someone from Epic can provide insight on these findings. As for our project, we are really needing every ounce of performance for our game to run at a solid FPS target, if possible, having a 60fps mode in Switch/PS4.
The figures I’ve shown here are already making a difference in how the games runs on said consoles so if we can get a tip on how to improve the engine performance (by making some modifications to the engine, for example I know that by modifying how components transform update, by avoiding recalculations on every component in line a decent amount of performance can be gained ) that would be extremely helpful and enlightening.
Just tested 5.0 and cpu metrics report pretty much the same as 5.3, only difference is higher ram usage in 5.3