This is more a general question regarding PSO precaching and whether you recommend that a title could ship with PSO precaching only ?
We see several PSO precaching misses reported through r.PSOPrecache.Validation=2 that we are fixing one by one, however there are still some PSOs being created outside the precaching validation code paths and at first sight, I don’t see how PSO bundled cache could be avoided to fill in the gaps.
To give you an example, the game starts with an introduction sequence with the camera flying through the environment. There are several camera cuts and I suspect between camera cuts different post processing configuration which triggers different renderer code paths dynamically.
Some examples of PSOs created on the fly during the sequences that are not reported by r.PSOPrecache.Validation=2 (mentions I saw in the callstacks):
What does your performance look like during those introduction sequences? Have you tracked noticeable hitches back to the PSO compilation? When are you looking to ship your title? I am checking if we are looking to support PSO precaching for our post-processing passes. Otherwise, your two options are the ones you already described: either use a bundled cache to cover the missed PSOs or implement the precaching mechanism for the affected passes.
Right, that is certainly not ideal. Have you tried to set the r.PSOPrecache.GlobalShaders=2 cvar, which will precache all global shaders at startup? It does have the caveat that it will have long startup times the first time the game is loaded, as the engine will need to compile all global PSOs. If that option takes too much time, it might be possible to tweak the code path enabled via that cvar only to include the post-processing shaders, although I assume that with the deadline, you are looking to make as few code changes as possible. I still want to give you those options, so let me know what you think.
Thanks for creating the PR! I have informed the team. They will review the PR as soon as possible and get back to you with any feedback they might have. Please let me know if you have any further questions.
Implementing precaching for the missing cases will likely take a while, the other consequence of fixing precaching misses is to greatly increase the amount of permutations being created async.
To give you an example fixing these two adds the following amount of PSOs to create during global precaching initialization:
While in fact the total was < 20 actual PSOs being missed and created on the fly. We will try to combine pso bundled cache + precache for now (we are closing in on a deadline to ship).
Performance wise it’s usually ~100-300ms (however it really depends on the hardware) per first-encountered PSO so the first experience is pretty bad (emulating with -clearpsodrivercache). This is on a dev machine with an nvidia graphics card and 16/32 cores.
Getting back to this, I updated the PSO counts in my previous reply, I was looking at the wrong info..
Regarding `r.PSOPrecache.GlobalShaders=2` I just gave it a try however it is crashing at the moment, `ShaderCompiler.cpp::PrecacheComputePipelineStatesForGlobalShaders()` does not seem to handle non-compute global shaders so it crashes.
We are still evaluating whether we will do PSO precache only and workaround the issues we have or do PSO precache + bundled, in the meantime the issue can be closed.