Hi all,
We’ve been debugging a slow level initialization issue and tracked it down to UpdateSurfaceCachePrimitives when r.Lumen.HardwareRayTracing=1. When the level initializes there is a very large set of primitives/ cards that need to be updated and the cost of this update is causing long waits on the render thread which is spiking frame time to > 500ms over a decent time period while the card initialization list is drained over multiple frames. This leads to a game that is unplayable in our case until initialization is complete (5-10 seconds). Subsequent incremental updates while navigating the world are within acceptable timings.
All timings are with Superluminal attached which does change absolute values somewhat, timings here are used for insight rather than being the true timing of the function in the Test build.
Digging into the code using Superluminal a few things stuck out:
[Image Removed]The root cause here seems to be the use of reserve within a tight loop. When the task number is quite large this can lead to a significant number of allocs and mem copies. This massively blows up the time needed to execute this function when there is more than a small delta update needed (i.e enabling the feature or loading into a level that needs full initialization).
We resolved this issue by calculating and allocating the fully needed space of the array before entering the tight loop.
[Image Removed]This brought our worse case times for this function from ~330ms -> ~23ms on my developer PC.
23ms was still not good enough for us to have a workable frame time during card initialization so the next target here was the sorting of the mesh cards:
[Image Removed]When the mesh cards are sorted it’s sorting the full list as returned by the tasks even though there is a max number that can be updated per frame as defined by LumenScene::GetMaxMeshCardsToAddPerFrame(). So basically there is a situation where a massive list can be sorted but only a subset of that list is important to the following calculations.
[Image Removed]To resolve this (in a naive way to be sure) we are just doing a partial sort on the array to get the number of elements needed, improving the sorting time.
With this done we have managed to reduce the initialization ticks for this function for our scene from ~330ms -> ~8ms.
For completeness, the remaining big chunk of time is spent in FLumenSceneData::AddMeshCardsFromBuildData, specifically in the AddSpan functions on the TSparseSpanArray’s. We have not investigated how to make this faster at this stage.
We’ve also taken a look at the latest code available via perforce and seen this function has had a decent number of changes / improvements to it that could help with some of these issues that we have experienced and are looking forward to testing these changes when we can move to a newer version of unreal. That being said there still seems to be some reallocs happening during hot loops that may get hit hard on first initialization or level load.
Thanks for your time.
-Tim C