Measuring extra memory usage caused by enabling r.PSOPrecache.KeepInMemoryUntilUsed

Hi there,

We recently enabled `r.PSOPrecache.KeepInMemoryUntilUsed` when we discovered that we have been getting 40ms to 100ms hitches from many precached PSOs using Insights. I found https://www.unrealengine.com/en-US/tech-blog/game-engines-and-shader-stuttering-unreal-engines-solution-to-the-problem as well as https://forums.unrealengine.com/t/pso-caching-performances/2699228 that touch on this issue we are experiencing in the Nvidia driver level cache.

I tested the `r.PSOPrecache.KeepInMemoryUntilUsed` cvar suggested in that linked thread, and it did help a lot. In our case, we have been getting a lot more hitches from precached compute PSOs instead of graphics, so I tested setting a high number for `r.PSOPrecache.KeepInMemoryComputeMaxNum` like 10k, and was able to eliminate most of the hitching coming from compute PSOs, which was great. The linked thread mentioned that there is a more automatic solution that doesn’t require enforcing a max kept in memory PSO count that will come in 5.8, but we are locked to 5.6 at the moment.

As the next step, would like to measure the extra memory cost incurred by keeping those precached PSOs in memory, so we can make more informed decision regarding what numbers to use for `KeepInMemoryComputeMaxNum` and `KeepInMemoryGraphicsMaxNum`. I tried to look at LLM tags including :Total, :Shaders and :PSO in Insights, but not sure if these tags will include the extra memory cost happening on the driver side, and the numbers are a bit noisy to look at when doing A/B test since we currently don’t have content/tool setup to make sure two testing sessions will go through the exact same rendering sequence, like an automated fly through. Wondering if you have any suggestions regarding how to track down the memory cost of enabling `r.PSOPrecache.KeepInMemoryUntilUsed`. Thanks in advance!

Best regards,

Min

[Attachment Removed]

Hi there,

You are right, we have overhauled PSO precaching in quite a bit of 5.8, and some changes are fairly backwards-compatible, allowing you to cherry-pick the changes, which other studios have taken advantage of before. For example, if you want to take advantage of the changes that are coming in 5.8 for better stat tracking, you could cherry-pick CL 50531576 into your own build. That change includes two new counters to track the memory used by precached graphics and compute PSOs, as well as the number of in-flight precached PSO compilation jobs. Unfortunately, the change is quite chunky, but it could be worth adding to your build. That said, the LLM stats won’t account for the driver’s memory footprint.

If you figure adding in that change is too much effort, you will likely have to make do with the LLM stats you have already found. The problem with those stats right now is that they don’t account for compute and graphics PSOs separately, so you will not get a fully accurate picture of the amount of memory used. On top of that, your test runs aren’t entirely consistent, so unless you fix that, you won’t get an entirely accurate picture. It would be worth setting up consistent runs using an automated flythrough while simultaneously tweaking the number of PSOs kept in memory.

Cheers,

Tim

[Attachment Removed]

Sounds good! If anything else does come up, feel free to reach out again.

[Attachment Removed]

Thanks for the suggestion! I’ll start with taking a look at that CL you mentioned and see if there are parts of it we can take advantage of if taking the whole thing is too much effort.

[Attachment Removed]