Frontloading PSO Precacheing Requests && More info on Dynamic Shader Loading

We’re following the recommended approach for handling PSO precaching, similar to what Fortnite does. Specifically, we track

PipelineStateCache::NumActivePrecacheRequests() and hold the loading screen until all requests are completed.Our Current Flow:

  • We start in a main menu level (a World Composition level).
  • From here, players choose a game mode:
    • Singleplayer
    • Multiplayer
    • Load Game, etc.
  • Once a choice is made, we load the corresponding main level with the appropriate session settings.

This process triggers a second batch of PSO precaching requests, which makes sense since we’re transitioning to a new level with new materials. However, I have some questions about optimizing this for a better user experience.

Questions:

Can PSO precaching be batched across multiple levels?

  • Ideally, I’d like to request precaching for both the main menu and the main level at the same time, similar to games that show a single “Compiling Shaders” screen.
  • I’d prefer to avoid restructuring assets or packing main-level content into the menu level.
  • In Fortnite, I don’t recall seeing two separate loading bars—does FN handle this differently?

Dynamic Shader Loading in 5.5

  • I tested enabling Dynamic Shader Loading, and it produces the expected flow, but we send less than half the usual PSO requests and hit many more new PSOs. This aligns with the feature’s expected behavior, given its name and fallback strategies.
  • I also enabled DrawnComponentBoostStrategy and haven’t noticed default materials popping in.
  • Is this setup similar to what FN uses, and are we leveraging these parameters correctly?

Reusing the PSO Cache Between Processes

  • We’re running multiplayer automated tests, and I’ve noticed that when a second client joins after the first, it still fires all the same PSO precache requests that the first client already processed.
  • However, when I run with r.ShaderPipelineCache.Open=1 it seems to skip this process for the second client. I wanted to check if this is the recommended approach, or if there is a more proper way you’d advise handling this.

Any insight into best practices for this would be greatly appreciated!

Hi there,

Just to note, there are two PSO systems in Unreal. The older bundled PSO system and the newer precaching system. These systems can complement each other.

I also cannot speak to how Fortnite handles it, but I can provide some strategies that may be useful

In the case of multiple levels, it may be useful to do a bundled collection of the PSOs in the menu and begin precompilation early so that they may finish by the time loading is complete. That way, the precaching system is really only engaging fully for PSOs that have been missed by the bundled collection. In practice, precache might still kick off new PSO jobs as it is more conservative with that PSOs are needed. For example, the precache might compile a PSO for a custom depth pass even if a given object never uses it, while the bundled system only regards PSOs it logged during a collection run.

Depending on how static your levels are, bundled PSO collection may be a good fit. It allows more work to be shifted upfront during loading and menu stages because it knows about PSOs from the future.

The precache system primarily fires on PostLoad events of components that need PSOs. In non-editor builds, PostLoad should fire on the CDO when loaded. So, it should be possible to asynchronously load some of the asset from the next levels before beginning to load the level. Depending on how early this is feasible, this could also allow some of the time to be hidden behind menu animations or general user interaction.

The following case, [Unreal Developer Network [Content removed] covers using the two systems together. There are a few things of note, and some of those are expanded on here: [Feed [Content removed]

In terms of reusing a cache, this isn’t possible with PSO precaching alone. I believe that running r.ShaderPipelineCache.Open is engaging the bundled PSO system. I’m sure if it’s actually skipping the step as it hasn’t been fed back in. What I suspect is happening is that the PSOs are being recorded as bundled and ending up not reflected in NumActivePrecachedRequests(). It should still show up in NumPrecompilesRemaining(), which includes bundled and precache PSO counts.

In general, all PSO requests will still need to fire at present, but each task should be very fast as it should be cached by the driver. Many things could invalidate some or all of the driver cache, so Unreal currently just checks them all for now.

Finally, in terms of Dynamic Shader Loading, this isn’t a feature I’m familiar with, so I’ll hand this to somebody who can both speak specifically to Fortnite’s usage of these systems and Dynamic Shader Loading.

I hope that helps.

Best regards,

Chris

Just bumping this post again. Since FN either does behaviour that appears to look like what we’re after or the Precaching compiles are just hidden by title-less loading bars

Hi,

Sorry for the late reply - I was on holiday for a bit.

We don’t do anything special in FN. During startup all global compute and specific global graphics PSOs are compiled together with all the required PSOs for the menu. During level loading all the other PSOs are compiled. There is nothing level specific but everything works from component creation - so also happens during async level loading for example. After a component is created all the required PSOs are requested for compile during OnPostLoad. It needs to know a few things from the component but most importantly it needs to know what material (instance) and vertex factory combination is needed. On second runs it will request compile the same set of PSOs but it will be a lot faster because they are cached by the driver. If something changes like drivers, or different set of assets are loaded then PSO compilation will trigger again.

Can you give a bit more info on what you mean with dynamic shader loading? Are you talking about EPSOPrecacheMode::PreloadShader - this is used on consoles to make sure the shader are decompressed and ready for rendering. It ignores all the other PSO related state because that’s not important on consoles. It helps with micro hitches mostly on previous gen consoles.

Every client and process will request it’s own set of PSOs which it will need. Normally they should all be cached by the driver if they are the same then they should be compiled/retrieved fast enough. I guess if the user cache is used then all previously requested PSOs will be compiled during the next boot as well, but I don’t know all the details about the shader pipeline cache. Can you give a bit more details on the steps you are taking here and perhaps share a few insight traces to better see what’s going on.

Cheers,

Kenzo

Hey Chris,

Thanks for taking the time to write up an answer to the questions. Using a bundled approach was the first idea that came to mind as well. We already use bundles in conjunction with precaching, and our game is open world, though the levels are still in flux (the initial spawn point remains mostly unchanged). Normally, our bundles ignore precached PSOs. I did a test run where the bundles also tracked precached PSOs and created a second bundle specifically for this. Unfortunately, precaching still fired the same number of requests between each level load.

We’re already hiding the requests behind loading screens.

Preloading some of the assets was another idea I had as well. The problem I saw was components can have different material combinations to their default objects when placed in a level, so asynchronously loading some assets from the next level would likely require managing a list of component-to-material combinations. Which felt a little backwards because levels already handle this right?

To quickly test this, I manually added a chunk of the starting area to the main menu and placed it behind some rocks. This did shift the work, but identifying each precached component making a request and moving it over was tedious. Additionally, this didn’t address HLOD components, which wouldn’t translate well to the world composition-based main menu level.

This is where I’m particularly interested in how FN is handling this.

OH my apologies I didn’t see that was a link. I’ll give this a go

r.ShaderPipelineCache.PreOptimizeEnabled

Edit: Unfortunately the behaviour didn’t change from the above

No worries.

One thing you could try is walking the asset registry for instances of objects in levels and getting the material overrides from there. We’ve done that to boost bundled PSO counts on a version too old to have precaching support. Trying to get material overrides for non-override objects in a blueprint does expand the search space beyond what’s reasonable, though. In this case, you may be able to load the instance directly as you can get soft references from the registry, but I haven’t tried using it that way.

There is a way, if you are okay with the increased memory use, to cut down on PSO request duplication between the system if D3D12.PSOPrecache.KeepLowLevel is set to 1.

Somebody who can speak to Fortnite’s implementation should respond in due course.

Best regards,

Chris