Hi @lab-one,

Are you able to reproduce this issue in a project that you could share with us for further investigation?

Otherwise, could you provide some more information regarding the level where you are seeing the performance regression?

  • enable stat SceneRendering and provide screenshot
  • is the level using Instanced Static Meshes?
    • can you provide a rough estimate of number of instances?
    • do the ISMs use Nanite or regular meshes?
  • does the level contain a lot of Static Meshes using World Position Offset and do those meshes have “Evaluate World Position Offset in Ray Tracing” enabled? can you provide a rough estimate of number of such meshes?
  • does the level contain a lot of Skeletal Meshes? can you provide a rough estimate of number of such meshes?



Hi Tiago,

Was not expecting my comment to summon someone from Epic itself, thank you so much for your swift reaction, and I apologize for unnecessary gray hairs if you got any, because I got.

It was a bit unfair to me to say that issue is coming from the engine, it is just 5.4 update made it more prominent.

You can see stat scene rendering in my original comment and it shows that the issue is in GatherRaytracingWorldInstances.
So it all boils down to checkbox “Visible in Raytracing” on foliage.

I total have 560k foliage instances on my scene, and as soon as Visible in Raytracing checkbox is activated on foliage I see how this function goes crazy, which is natural for 560k instances.

It’s just that in 5.3 performance of that function was under my target value ~22ms, so I was not paying attention, and in 5.4 it stopped being there, and this drop is more noticeable the more foliage I have on my scene.

My workaround is obviously to disable Raytracing on foliage, because it is unnecessary.

However, I see that even if I set Cull Distance for foliage to be extremely low, function is takes performance. I don’t know much about it, but maybe that information will help you as well.

Extra unrelated side note: I am using Dynamic Grass Plugin https://www.unrealengine.com/marketplace/en-US/product/dynamic-grass-system, and that plugin doesn’t respect Visible in Raytracing checkbox, and that is what made my issue actually noticeable.

Same thing for me in 5.3 I was getting 70 FPS now I’m getting around 30. Setting visible in ray tracing to false for foliage solves the issue.

Also using a fair amount of instanced static mesh with nanite.

Is it normal that ray tracing is still evaluated even if Use Hardware Ray Tracing When Available is set to false in project setting ? because I still get the same performance whether it’s false or true.

Deleting the hole foliage does not fix the issue. Also the telemetry plugin does not fix anything at my machine.

I was able to get almost same performance like I got with 5.3 by doing:

UnrealEditor.exe path_to\ProjectName.uproject -run=DerivedDataCache -fill to fix the incredible stuttering each few msecs. Seems like the automatic-cachinge feature is bugged and/or it’s not generating required files after upgrade anymore.

r.Nanite 0 to get about double the FPS. In 5.3 it’s same scene and runs much better (even with r.Nanite 1 there’s no difference). As far I’m aware none of the meshes uses nanite. Nanite is ok as long as it does not get into my way. But even disabling Nanite at all meshes in the map does not fix the issue. Neither removing the foliage. The only difference is 5.3 (is ok) vs 5.4 (huge performance loss). If Nanite makes the performance extreme worse its a big no. It’s better to make proper retopo even in case if you might do some collisions anyway.

I’m not a computer expert so take my comments with some salt. I have been plagued with the FPS drops in UE 5.4 so I tried to work on the problem, if only so I could use UE again.

I seemed to me that there are two problems with FPS rolled into one. Firstly, the capping FPS to 60 frames, which means less frames to play with. Secondly, the frames are being used for UE5 to create shaders on the fly.

The first problem may not be the making of UE and may have a simple solution in Window 11. Microsoft is “greening” Window 11. If you look at the settings in “System > Power & Battery > Energy Recommendations” there is an option which to “Lower your refresh rate to 60 Hz to conserve energy”. Thiis may not have anything and may be turned off, but it was a coincidence that I was also getting 60 FPS on my screen. It made me see whether I could turn off the 60 Hz and see what happens. After so looking I found an option in “System > Display > Advanced display” which can “Choose a refresh rate”. It had the rate from “60.03 Hz” tp “144.03 Hz”, so I chose the latter. When I went back to UE the FPS counter was around 120 This may be a guess, but give it a go and see what happens.

Now to the main problem and for those wanting the conclusion - the FPS may because the frames are now being used to create the shaders on the fly when the PIE game. This started with a look at my output log. Here is an extract from my output log:

"Creating RTPSO with 18 shaders (0 cached, 18 new) took 166.35 ms. Compile time 43.76 ms, link time 122.57 ms.

LogD3D12RHI: Creating RTPSO with 17 shaders (0 cached, 17 new) took 47.95 ms. Compile time 37.30 ms, link time 10.63 ms.

Creating RTPSO with 18 shaders (10 cached, 8 new) took 47.75 ms. Compile time 39.12 ms, link time 8.61 ms.‘’

This told me that the shaders created around the time of the “drops” displayed in the FPS counter on the screen. A theory, but I needed to delve deeper to see whether “RTPSO” had any relevance. With some Google help I was able to find out that it was “ray tracing pipeline state object” or RTPSO for short. RTPSO gets the data from frames on the fly to create shaders in DX12. I don’t say that I understand the computer mumble, but in " Ray Tracing Gems High-Quality and Real-Time Rendering with DXR and Other APIS" article by " Copyright © 2019 by NVIDIA" the author says that:

“Fortunately, both DirectX Raytracing and all other NVIDIA ray tracing APIs enabled by RTX expose the ability to compile in parallel multiple ray tracing shaders to machine code”.

The author then explains what this means in practical terms:

“This parallel compilation can take advantage of the multiple cores of today’s CPUs. In the
experimental UE4 implementation, we used this functionality by simply scheduling
separate tasks, each of which compiled a single ray tracing shader or hit group into
what DXR calls a collection. Every frame, if no other shader compilation tasks were
already executing, we checked if any new shaders were needed and not available.
If any such shaders were found, we started a new batch of shader compilation
tasks. Every frame we also checked if any prior set of shader compilation tasks
had completed. If so, we created a new RTPSO, replacing the previous one. At any
time, a single RTPSO is used for all DispatchRays() invocations in a frame. Any
old RTPSOs replaced by a new RTPSO are scheduled for deferred deletion when no
longer used by any in-flight frames. Objects for which a required shader was not yet
available in the current RTPSO were removed (skipped) when building the TLAS.”

The shaders are then held in the Shader Binding Table, which what stored in memory and eventualy stored in my project in the “DerivedDataCache” folder. The use of the table is then used to make the changes to the shader:

“This table is a memory buffer made up of multiple records, each containing an opaque shader identifier and some shader parameters that are equivalent to what DirectX 12 calls a root table (for a Graphics Pipeline Draw or Compute Pipeline Dispatch). Since this first experimental implementation was designed to update shader parameters for every object in the scene at every
frame, the Shader Binding Table management was simple, mimicking that of a
command buffer. The Shader Binding Table was allocated as an N-buffered linear
memory buffer, with size dictated by the number of objects in the scene. At every
frame we simply started writing the entire Shader Binding Table from scratch in
a GPU-visible CPU-writable buffer (in an upload heap).”

The author concluded that RTPSO is there to stay, but is somewhat expensive in some practical applications:

“The recent introduction of dedicated hardware for ray tracing acceleration and the
addition of ray tracing support in graphics APIs encouraged us to be innovative
and experiment with a new way of hybrid rendering, combining rasterization and
ray tracing. We went through the engineering practice of integrating ray tracing
in Unreal Engine 4, a commercial-grade game engine. We invented innovative
reconstruction filters for rendering stochastic effects such as glossy reflections, soft
shadows, ambient occlusion, and diffuse indirect illumination with as few as a single
path per pixel, making these expensive effects more practical for use in real time.
We have successfully used hybrid rendering to create two cinematic-quality demos.”

In the end I was much wiser, but it meant that high FPS may something for the history, at least for a time until the PC devices catch up. It is clear that UE uses ray tracing for a number of relating to lighting and is here to stay. The better question is whether UE is to solve the “frames use the shader data on the fly”. The UE options around the smoothing frames, etc. have not worked that well for me so far, particularly if Microsoft and Nvdia are also capping refresh rates.

Same for me. Lost around 5-10% of FPS, 100 → 90. Not that I can’t optimize it back to 100, but hey, I will have to optimize it anyways after all the game mechanics are in place. If UE will suck 10% of performance on each update there is no way I’ll fit to the budget.
So it’s not a question about “how to optimize” but rather “why does UE 5.4 perform significantly worse than 5.3 with the same vanilla Nanite+Lumen scene?”
I am kind of scared now.
ps. Thou I can confirm FPS/timings graph looks smoother in 5.4.

I think TSR became more expensive. Do you have it enabled? have you compared 5.3 vs 5.4 with 100% resolution and TAA, instead of adaptive resolution with TSR?

1 Like

I do use TSR, that was the only way to get reasonable performance with Lumen. I’ll play with TAA now, thanks.

1 Like

Nope. With TAA it’s all the ~same: 120FPS → 100FPS.

1 Like

I finally just switched to DX11 and noticed it is very performant. Epic have more optimization work to complete regarding DX12 imho.
No nanite or virtual shadow maps in DX11 - but lumen features work great and TAA does a decent job…it might be worth checking out DX11 SM5. UE5.4 performed 5% better than 5.3 in DX11 packaged builds also,

1 Like

This might be an answer if UE53 would not run much better. I don’t remember a constant decrease like this in any of the UE4 releases. What happens in 5.5?

For me it’s all the same with dx11 - 5.3 is WAY faster, then 5.4. I guess there must be some catch in my project, but it’s just a bunch of instanced static mesh on screen plus Voxel terrain, that’s it. Quite basic things to work with - not much you can tune if you just have a static mesh with a blunt material.

I also went for 5.3.2, because 5.4 is just pain to use. I can not understand how it is slow. It’s not worth it.

CSM’s performance on movable nanit mesh is abnormal. At 5.4.

It normalizes when you change CSM usage to VSM when you have a moveable Nanite mesh. But CSM performs better than VSM. 5.3.2 performs better.

I make small test between UE5.0 and UE5.4 at the same settings and third person project. This is the result.

On the UE5.4 I just play and don’t move the character, get 76 FPS. Turn off lumen for GI and reflection will make UE 5.4 run at 160 FPS but still bad if compare to UE 5.0 which can achieve 240+ FPS. After this I decide to rollback my city builder project using old version engine. (Ryzen 9 5900x + RTX 3060, at 1080p)

You are my hero. Thank you very much!

5.0 has broken frametimes which leads to micro stutters. Better use 5.1 - 5.3.

(post deleted by author)

I’ve never liked 5 since it first came out with its pitch black corporate look and laggy interface and completely redesigned UI/UX and its abhorrent editor bugs. 4 iterations in, I am GLAD I never shifted my projects to 5. Still loving 4.26. Can’t say the same for my client’s projects who seem to think that ignoring the true, tried and tested and picking the bleeding edge of technology is where they should begin when creating a serious project.