Hi Tim,
We are basically using the default settings r.PSOPrecache.Components=1 and r.PSOPrecache.ProxyCreationWhenPSOReady=1 r.PSOPrecache.ProxyCreationDelayStrategy=0 except for r.PSOPrecache.KeepInMemoryUntilUsed. Unfortunately, it’s difficult to provide a self-contained repro since really the issue manifests with our content and the large number of PSOs in a real project.
Yeah, it’s a hitch as reported by CheckAndUpdateHitchCountStat (and you can see it in UnrealInsights) where RHICreateComputePipelineState takes a very long time from FCompilePipelineStateTask, like over 100ms, but this is not a precache, it’s the JIT path via FRHICommandListBase::AddDispatchPrerequisite, so it causes a long RenderThread pause. This shouldn’t happen since the object should not be drawn due to r.PSOPrecache.ProxyCreationDelayStrategy until the PSO is ready and in fact the PSO state is EPSOPrecacheResult::Complete as seen via the marker this code emits to Insights.
I think Epic has already reproduced this since it seems to be the reason for adding the r.PSOPrecache.KeepInMemoryUntilUsed cvar. Epic’s comment on it is this
TEXT("If enabled and if the underlying GPU vendor is NVIDIA, precached PSOs will be kept in memory instead of being deleted immediately after creation, and will only be deleted once they are actually used for rendering.\n")
TEXT("This can speed up the re-creation of precached PSOs for NVIDIA drivers and avoid small hitches, at the cost of memory.\n")
TEXT("It's recommended to set r.PSOPrecache.KeepInMemoryGraphicsMaxNum and r.PSOPrecache.KeepInMemoryComputeMaxNum to a non-zero value to ensure the number of in-memory PSOs is bounded."),
which seems to be the exact situation we are hitting, a PSO is precached via the engine’s PSO precaching system but not used for rendering immediately so the NVIDIA driver does not actually cache it. This violates the assumption the PSO preaching code makes that simply calling CreatePipelineState and then freeing the PSO is enough to warm up the PSO and cause the driver to cache it. This causes a future CreatePipelineState call which the engine expects to be quick since the status is set to EPSOPrecacheResult::Complete, to actually take a long time and cause a bad hitch.
Setting r.PSOPrecache.KeepInMemoryUntilUsed does solve this issue as Epic’s comments indicate, so we are definitely hitting this same driver behavior, but the issue is that unless we store an unlimited amount of not-yet-used PSOs by setting the limits to 0 which the code advises against (and also we end up storing many PSOs in memory forever) eventually not-yet-used PSOs are dropped by the code in TPrecachePipelineCacheBase::TryAddNewState when there is no more room in InMemoryPSOIndices
// Evict the oldest PSO if we're at maximum capacity.
if (InMemoryPSOIndices.Num() == MaxInMemoryPSOs)
{
uint32 PSOIndex = InMemoryPSOIndices.First();
InMemoryPSOIndices.PopFirst();
// Enqueue the corresponding PSO for cleanup.
PrecachedPSOsToCleanup.Add(PrecachedPSOInitializers[PSOIndex]);
}
However, this causes another issue is that these PSOs states are left at EPSOPrecacheResult::Complete, so the engine will never retry caching them if they are later requested again, which means later the same hitch will occur if the PSO is then later used.
Practically this can occur if for example an object in one map preaches a PSO but that object is never drawn (the object is culled or in an area of the map the player doesn’t visit). Then later that PSO is evicted due to MaxInMemoryPSOs. After that the player then travels to another map which uses the same material. The component calls PrecachePSOs(), however nothing is actually precached since the engine thinks the PSO was precached already since the state inside the preaching system persists for the entire process and the PSO is marked EPSOPrecacheResult::Complete. However, it is not, and a hitch occurs.
One possible fix was to have TPrecachePipelineCacheBase::ProcessDelayedCleanup() do something like this
if (ShouldKeepPrecachedPSOsInMemory())
{
DEC_DWORD_STAT(STAT_InMemoryPrecachedPSOCount);
//New code
if (!EnumHasAnyFlags(FindResult->ReadPSOPrecacheState(), EPSOPrecacheStateMask::UsedForRendering))
{
//If we are freeing this one, mark it uncached so we will try again later
PrecachedPSOInitializerData.Remove(InitializerHash);
}
//New code
}
However, then this also requires additional handling in FMaterialPSORequestManager::MarkCompilationComplete to handle FMaterialPSOPrecache request with now stale PSOs in them.
We were wondering if Epic had looked into this since this edge case seems to be a bug in the original code. Primarily we were wondering:
- Does this fix seem correct? With this change a second component calling PrecachePSOs() will at least restart the PSO precache process for these discarded and uncached PSOs due to the MaxInMemoryPSOs limit since their state is reset to EPSOPrecacheResult::Unknown which prevents the hitch since the new component will wait to create the scene proxy while the PSO precaching system does another precache attempt.
- How can we handle a transition from EPSOPrecacheResult::Complete back to another state like EPSOPrecacheResult::Unknown. Currently this does not happen in the PSO precaching code, so there is still a path where it hitches since existing components would have already made their scene proxies while the PSO state was EPSOPrecacheResult::Complete. Currently the code treats Complete as a terminal state for the rest of the process lifetime, but unfortunately with the driver behavior it is not and PSOs can become “uncached” if they are not used soon enough.
Let me know if I can provide any more information. I see CL 43263054 improves this slightly by making the MaxInMemoryPSOs handling smarter so the limit is less likely to be reached, however unfortunately the problem can still occur if it is.