PSO precaching serialized due to NV driver lock

Hi!

I’ve been experimenting with PSO precaching and have been trying to precache as much as possible during loading screen.

In the process of doing so I’ve noticed that the PSOPrecompilePool jobs (there are many of those jobs executing concurrently) when trying to call RHICreateGraphicsPipelineState is blocked by a lock in the NV driver thus causing seemingly all jobs to be waiting on each other and the actual PSO creation becoming serialized.

I am running DX12, SM6 with the latest NV game ready driver (581.42).

I would assume this is something that you are already aware of but are there any plans or talks with NVidia from your end regarding this issue?

Best regard,

Anders Pistol

Neon Giant AB

Hello,

This does sound like a similar issue we’re investigating, a PSO compilation hang in 580.88 and 581.42 due to a lock in the driver. We don’t have a workaround yet, but if you have a repro, can you try a different driver versions?

Hi Alex!

I tried on 577.0 as well and the same issue was present in that driver as well.

So, while creating a PSO (CreatePipelineStateFromStream) the NV driver spends most of it

time allocating and freeing memory through RtlAllocateHeap and RtlFreeHeap which uses

a critical section lock to ensure the RTL heap consistency.

This lock of course stalls all the way up the PSOPrecompilePool jobs.

There are 95 PSOPrecompilePool jobs running and all are hammering the same CS.

[Image Removed]

Thanks for the additional info. This does sound like it might be similar to what we are seeing, but for now it’s looking like a driver issue. Have you been able to work around it by lowering the number of PSO compiling threads?

Thanks for the reply!

Running with “r.pso.PrecompileThreadPoolSizeMax=4” does indeed change the behaviour, it no longer blocks all of the PSOPrecompilePool jobs and they all seem to be running concurrently in the driver. So the overall performance is almost the same as running without a pool size cap.

Still the situation is far from ideal since it takes considerable time to precompile all the PSOs.

We’re still awaiting a potential driver fix for this, but in our case we weren’t able to easily reproduce the stall with -dpcvars=“r.pso.PrecompileThreadPoolPercentOfHardwareThreads=0,r.pso.PrecompileThreadPoolSize=16” -clearPSODriverCache so you may be able to increase the pool size a bit while we wait for a fix.