Mobile Vulkan Precaching PSO 在部分机型上出现崩溃

我们的游戏在移动端开启了 Precache PSO,并打开了 r.PSOPrecache.GlobalShaders=1。在 Vulkan 平台上默认是多线程编译的,但目前出现了两种崩溃:

<br/>

1. 偶现的在材质编译 PSO 中崩溃。目前已发生的崩溃都是同一个材质,但这个材质本身没发现异常,因此基本排除了Shader代码本身的问题。初步怀疑是由于开启了多线程PSO创建导致。

出现的机型为 ROG 7 Pro (8 Gen2, Adreno ™ 740) Android 13 驱动版本 OpenGL ES 3.2 V[Content removed] 1665047908) (Date:10/06/22)

崩溃日志和堆栈如下: [Image Removed]2. 由于是Shipping包,暂时无法定位到具体的Shader。此前为了修复一个稳定出现的 PSO 编译崩溃,我们参考 [Content removed] 进行了修改,修改之后才偶现崩溃。目前尚无法确定有关联性。

出现的机型为 vivo X200 (天玑 9400, Mali-G925-Immortalis MC12) Android 15 驱动版本 1.3.278|OpenGL ES 3.2 v1.r49p1-03bet0.9a303f2b7f6a1b089a8ec9196cd0a8d4

崩溃堆栈如下:

[Image Removed]

<br/>

想请问这些崩溃是否与多线程编译存在关联?是否有除关闭多线程编译以外的方法解决该问题?

<br/>

重现步骤

偶现

Hi,

有几个CVar,麻烦设置一下,看看是否有效

r.pso.CreateOnRHIThread=1

r.Vulkan.EnablePipelineLRUCache=1

r.Vulkan.PipelineLRUCacheEvictBinary=1

r.Vulkan.PSOLRUEvictAfterUnusedFrames=1000

r.Vulkan.ReleaseShaderModuleWhenEvictingPSO=1

我们测试了一下这些CVar,应该是把Precache回退到单线程编译了,是不会崩溃的。但由于单线程编译太久了,没法在版本中使用。

另外也想咨询一下Epic在堡垒之夜中是否也遇到了类似的问题,以及在堡垒之夜中是如何使用PSO的策略(PSO Precache 和 PSO搜集)的呢?

除此之外,我们还在游戏中遇到了大量的PSO Miss的情况,即使合并了最新的ue5-main的一些改动,依然无法完全避免Miss的出现。这一部分后续的计划是怎样的?是否在不久的将来有成熟的方案?

以及PSO目前在Vulkan上编译速度很慢,导致在加载场景时执行PSO Precache时间非常长,导致等待的时间很久,这也是我们不想关闭多线程编译的原因,有什么办法能优化吗?

Fortnite的移动端应该只用了Precache,没有用PSO Pipeline Cache File,因为Fortnite只用了Dynamic Lighting,且用的是Deferred,目前绝大部分情况都预测到了。

我们测试了一下这些CVar,应该是把Precache回退到单线程编译了你是只设置了r.pso.CreateOnRHIThread=1吗,理论上Precache不会变成单线程的,Precache的PSO应该是多线程编译的,但是如果是Precache missed的情况,就会到RHI线程编译l 。

请问你看到的miss的是哪些,可以把log发上来,我确认一下是什么情况,看看我们后续是否有修复?

我们开启上述 CVar 后,发现 RemoteCompileServices 也开启了,并且不能正常编译 PSO,会显示以下日志(仅包含部分):

10:13:17.314|0|0.000|0.000|E|[Tid=6111] LogVulkanRHI|Android RemoteCompileServices Failed to create graphics pipeline (Remote PSO compiler failed.).
10:13:17.322|0|0.000|0.000|E|[Tid=6116] LogVulkanRHI|Android RemoteCompileServices Failed to create graphics pipeline (Remote PSO compiler failed.).
10:13:17.373|0|0.000|0.000|E|[Tid=6110] Engine|LogOutputDevice: === Handled ensure: ===
10:13:17.373|0|0.000|0.000|E|[Tid=6110] Engine|LogOutputDevice: 
10:13:17.373|0|0.000|0.000|E|[Tid=6110] Engine|LogOutputDevice: Ensure condition failed: LocalPipelineCache != nullptr  [File:./Runtime/VulkanRHI/Private/VulkanPipeline.cpp] [Line: 1452] 
10:13:17.373|0|0.000|0.000|E|[Tid=6110] Engine|LogOutputDevice: 
10:13:17.373|0|0.000|0.000|E|[Tid=6110] Engine|LogOutputDevice: Stack: 
10:13:17.373|0|0.000|0.000|E|[Tid=6110] Engine|LogOutputDevice: [Callstack] 0x0000006F62EEB1E4 libUnreal.so(0x00000000190051E4)![Unknown]()  []
10:13:17.373|0|0.000|0.000|E|[Tid=6110] Engine|LogOutputDevice: [Callstack] 0x0000006F62E3FF54 libUnreal.so(0x0000000018F59F54)!VkResult FVulkanPipelineCacheChunk::CreatePSO<FVulkanRHIGraphicsPipelineState>(FVulkanRHIGraphicsPipelineState*, FVulkanPipelineCacheChunk::EPSOCacheFindResult, TUniqueFunction<VkResult (FVulkanChunkedPipelineCacheManager::FPSOCreateFuncParams<FVulkanRHIGraphicsPipelineState>&)>)  []
10:13:17.373|0|0.000|0.000|E|[Tid=6110] Engine|LogOutputDevice: [Callstack] 0x0000006F62E2CFF0 libUnreal.so(0x0000000018F46FF0)!VkResult FVulkanChunkedPipelineCacheManagerImpl::CreatePSO<FVulkanRHIGraphicsPipelineState>(FVulkanRHIGraphicsPipelineState*, bool, TUniqueFunction<VkResult (FVulkanChunkedPipelineCacheManager::FPSOCreateFuncParams<FVulkanRHIGraphicsPipelineState>&)>)  []
10:13:17.373|0|0.000|0.000|E|[Tid=6110] Engine|LogOutputDevice: [Callstack] 0x0000006F62E2CE58 libUnreal.so(0x0000000018F46E58)!VkResult FVulkanChunkedPipelineCacheManager::CreatePSO<FVulkanRHIGraphicsPipelineState>(FVulkanRHIGraphicsPipelineState*, bool, TUniqueFunction<VkResult (FVulkanChunkedPipelineCacheManager::FPSOCreateFuncParams<FVulkanRHIGraphicsPipelineState>&)>)  []
10:13:17.373|0|0.000|0.000|E|[Tid=6110] Engine|LogOutputDevice: [Callstack] 0x0000006F62EA5A44 libUnreal.so(0x0000000018FBFA44)!FVulkanPipelineStateCacheManager::CreateVKPipeline(FVulkanRHIGraphicsPipelineState*, FVulkanShader**, VkGraphicsPipelineCreateInfo const&, FGraphicsPipelineStateInitializer::EPSOPrecacheCompileType)  []
10:13:17.373|0|0.000|0.000|E|[Tid=6110] Engine|LogOutputDevice: [Callstack] 0x0000006F62EA5398 libUnreal.so(0x0000000018FBF398)!FVulkanPipelineStateCacheManager::CreateGfxPipelineFromEntry(FVulkanRHIGraphicsPipelineState*, FVulkanShader**, FGraphicsPipelineStateInitializer::EPSOPrecacheCompileType)  []
10:13:17.374|0|0.000|0.000|E|[Tid=6110] Engine|LogOutputDevice: [Callstack] 0x0000006F62EA918C libUnreal.so(0x0000000018FC318C)!FVulkanPipelineStateCacheManager::RHICreateGraphicsPipelineState(FGraphicsPipelineStateInitializer const&)  []
10:13:17.374|0|0.000|0.000|E|[Tid=6110] Engine|LogOutputDevice: [Callstack] 0x0000006F62EAA96C libUnreal.so(0x0000000018FC496C)!FVulkanDynamicRHI::RHICreateGraphicsPipelineState(FGraphicsPipelineStateInitializer const&)  []
10:13:17.374|0|0.000|0.000|E|[Tid=6110] Engine|LogOutputDevice: [Callstack] 0x0000006F584EE904 libUnreal.so(0x000000000E608904)!FCompilePipelineStateTask::CompilePSO(FGraphicsPipelineStateInitializer::EPSOPrecacheCompileType const*)  []
10:13:17.374|0|0.000|0.000|E|[Tid=6110] Engine|LogOutputDevice: [Callstack] 0x0000006F584F0BA8 libUnreal.so(0x000000000E60ABA8)![Unknown]()  []
10:13:17.374|0|0.000|0.000|E|[Tid=6110] Engine|LogOutputDevice: [Callstack] 0x0000006F584F0DE8 libUnreal.so(0x000000000E60ADE8)!FPSOPrecacheAsyncTask::DoTaskWork()  []
10:13:17.374|0|0.000|0.000|E|[Tid=6110] Engine|LogOutputDevice: [Callstack] 0x0000006F567B07D8 libUnreal.so(0x000000000C8CA7D8)!FAsyncTaskBase::DoWork()  []
10:13:17.374|0|0.000|0.000|E|[Tid=6110] Engine|LogOutputDevice: [Callstack] 0x0000006F567B2484 libUnreal.so(0x000000000C8CC484)!FAsyncTaskBase::DoThreadedWork()  []
10:13:17.374|0|0.000|0.000|E|[Tid=6110] Engine|LogOutputDevice: [Callstack] 0x0000006F5686DF98 libUnreal.so(0x000000000C987F98)!FQueuedThread::Run()  []
10:13:17.374|0|0.000|0.000|E|[Tid=6110] Engine|LogOutputDevice: [Callstack] 0x0000006F56864D54 libUnreal.so(0x000000000C97ED54)!FRunnableThreadPThread::Run()  []
10:13:17.374|0|0.000|0.000|E|[Tid=6110] Engine|LogOutputDevice: [Callstack] 0x0000006F5666E000 libUnreal.so(0x000000000C788000)!FRunnableThreadPThread::_ThreadProc(void*)  []
10:13:17.374|0|0.000|0.000|E|[Tid=6110] Engine|LogOutputDevice: [Callstack] 0x00000074FCF0C854 libc.so(0x00000000000B8854)![Unknown]()  []
10:13:17.374|0|0.000|0.000|E|[Tid=6110] Engine|LogOutputDevice: [Callstack] 0x00000074FCEFECB8 libc.so(0x00000000000AACB8)![Unknown]()  []
10:13:17.374|0|0.000|0.000|E|[Tid=6110] Engine|LogOutputDevice: 
10:13:17.374|0|0.000|0.000|E|[Tid=6110] LogVulkanRHI|Android RemoteCompileServices Failed to create graphics pipeline (Remote PSO compiler failed.).
10:13:18.316|0|0.000|0.000|E|[Tid=6107] LogVulkanRHI|Android RemoteCompileServices Failed to create graphics pipeline (Remote PSO compiler failed.).
10:13:18.322|0|0.000|0.000|E|[Tid=6109] LogVulkanRHI|Android RemoteCompileServices Failed to create graphics pipeline (Remote PSO compiler failed.).
10:13:18.323|0|0.000|0.000|L|[Tid=6108] LogVulkanRHI|Stopping Remote Compile Services
10:13:18.326|0|0.000|0.000|E|[Tid=6108] LogVulkanRHI|Android RemoteCompileServices Failed to create graphics pipeline (Remote PSO compiler failed, error count has passed threshold. Future compiles will be in-process.).

报错的PSO并不固定,似乎和顺序有关,但都是材质PSO。请问这个现象正常吗

Hi,

抱歉,我有提到过设置r.Vulkan.AllowSynchronization2=0吗,如果没有,麻烦测试一下。