我们在 Android Vulkan 上开启了 PSOPrecache,并且设置了两个 CVar:
- r.pso.PrecompileThreadPoolPercentOfHardwareThreads=50
- r.PSOPrecache.GlobalShaders=1
在不同的Android设备上,比较高概率出现了这两个崩溃。
崩溃 1 堆栈:
#00 pc 000000000005bdc0 /apex/com.android.runtime/lib64/bionic/libc.so (abort+164) [arm64-v8a]
#01 pc 0000000009310228 libUnreal.so FAndroidErrorOutputDevice::Serialize(char16_t const*, ELogVerbosity::Type, FName const&) (.\Runtime/Core/Private/Android/AndroidErrorOutputDevice.cpp:52) [arm64-v8a]
#02 pc 000000000fd6a728 libUnreal.so FOutputDevice::LogfImpl(char16_t const*, …) (.\Runtime/Core/Private/Misc/OutputDevice.cpp:81) [arm64-v8a]
#03 pc 000000000f611f48 libUnreal.so FDebug::CheckVerifyFailedImpl2(char const*, char const*, int, char16_t const*, …) (Runtime\Core\Public\Misc/OutputDevice.h:246) [arm64-v8a]
#04 pc 0000000016f29618 libUnreal.so FVulkanPipelineStateCacheManager::RHICreateGraphicsPipelineState(FGraphicsPipelineStateInitializer const&) (.\Runtime/VulkanRHI/Private/VulkanPipeline.cpp:2123 [Inline: TRefCountPtr]) (Other infos:FRHIResource::AddRef() const Runtime\Core\Public\Templates/RefCounting.h:299FRHIResource::FAtomicFlags::AddRef(std::__ndk1::memory_order) Runtime\RHI\Public/RHIResources.h:68FRHIResource::FAtomicFlags::AddRef(std::__ndk1::memory_order) Runtime\RHI\Public/RHIResources.h:136) [arm64-v8a]
#05 pc 0000000016f2c204 libUnreal.so FVulkanDynamicRHI::RHICreateGraphicsPipelineState(FGraphicsPipelineStateInitializer const&) (.\Runtime/VulkanRHI/Private/VulkanPipeline.cpp:2242) [arm64-v8a]
#06 pc 0000000010004794 libUnreal.so FCompilePipelineStateTask::CompilePSO(FGraphicsPipelineStateInitializer::EPSOPrecacheCompileType const*) (Runtime\RHI\Public/DynamicRHI.h:1127) [arm64-v8a]
#07 pc 0000000010003e4c libUnreal.so TGraphTask<FCompilePipelineStateTask>::ExecuteTask() (Runtime\Core\Public\Async/TaskGraphInterfaces.h:639 [Inline: FCompilePipelineStateTask::DoTask(ENamedThreads::Type, TRefCountPtr<FBaseGraphTask> const&)]) (Other infos:FCompilePipelineStateTask::DoTask(ENamedThreads::Type, TRefCountPtr<FBaseGraphTask> const&) .\Runtime/RHI/Private/PipelineStateCache.cpp:3189) [arm64-v8a]
#08 pc 000000000f97c69c libUnreal.so UE::Tasks::Private::FTaskBase::TryExecuteTask() (Runtime\Core\Public\Tasks/TaskPrivate.h:509) [arm64-v8a]
#09 pc 000000000f97b8e8 libUnreal.so _ZN13LowLevelTasks13TTaskDelegateIFPNS_5FTaskEbELj48EE17TTaskDelegateImplIZNS1_4InitIZN2UE5Tasks7Private9FTaskBase4InitEPKDsNS_13ETaskPriorityENS8_21EExtendedTaskPriorityENS8_10ETaskFlagsEEUlvE_EEvSC_SD_OT_NS_10ETaskFlagsEEUlbE_Lb0EE11CallAndMoveERS4_Pvjb (Runtime\Core\Public\Async\Fundamental/Task.h:500 [Inline: operator()]) (Other infos:operator() Runtime\Core\Public\Tasks/TaskPrivate.h:188) [arm64-v8a]
#10 pc 0000000009334dd4 libUnreal.so LowLevelTasks::FScheduler::ExecuteTask(LowLevelTasks::FTask*) (Runtime\Core\Public\Async\Fundamental/TaskDelegate.h:308) [arm64-v8a]
#11 pc 0000000009336354 libUnreal.so _ZN13LowLevelTasks10FScheduler18TryExecuteTaskFromINS_7Private19TLocalQueueRegistryILj1024ELj1024EE11TLocalQueueEXadL [arm64-v8a]
#12 pc 0000000009335fa8 libUnreal.so LowLevelTasks::FScheduler::WorkerLoop(LowLevelTasks::Private::FWaitEvent*, LowLevelTasks::Private::TLocalQueueRegistry<(unsigned int)1024, (unsigned int)1024>::TLocalQueue*, unsigned int, bool) (.\Runtime/Core/Private/Async/Fundamental/Scheduler.cpp:513) [arm64-v8a]
#13 pc 0000000009336c64 libUnreal.so LowLevelTasks::FScheduler::WorkerMain(LowLevelTasks::Private::FWaitEvent*, LowLevelTasks::Private::TLocalQueueRegistry<(unsigned int)1024, (unsigned int)1024>::TLocalQueue*, unsigned int, bool) (.\Runtime/Core/Private/Async/Fundamental/Scheduler.cpp:571) [arm64-v8a]
#14 pc 00000000095ae814 libUnreal.so FThreadImpl::Run() (.\Runtime/Core/Private/HAL/Thread.cpp:66 [Inline: UE::Core::Private::Function::TFunctionRefBase<UE::Core::Private::Function::TFunctionStorage<true>, void()>::operator()() const]) (Other infos:UE::Core::Private::Function::TFunctionRefBase<UE::Core::Private::Function::TFunctionStorage<true>, void()>::operator()() const Runtime\Core\Public\Templates/Function.h:470) [arm64-v8a]
#15 pc 000000000954af48 libUnreal.so FRunnableThreadPThread::Run() (.\Runtime/Core/Private/HAL/PThreadRunnableThread.cpp:25) [arm64-v8a]
#16 pc 0000000009354da4 libUnreal.so FRunnableThreadPThread::_ThreadProc(void*) (.\Runtime/Core/Private/HAL/PThreadRunnableThread.h:187) [arm64-v8a]
#17 pc 00000000000c0b88 /apex/com.android.runtime/lib64/bionic/libc.so [arm64-v8a]
#18 pc 000000000005d5f8 /apex/com.android.runtime/lib64/bionic/libc.so [arm64-v8a]
java:
[Failed to get Java stack]
目前从引擎源码来看,发现可能的原因是GraphicsPSOLockedMap中的PSO在引用计数为0时,并没有立即删除,而是标记为了delete状态,但是在RHICreateGraphicsPipelineState创建时从GraphicsPSOLockedMap查找到,这时有两种情况:
- 在函数返回时,对象已被删除并标记DeletingBit,转换为FGraphicsPipelineStateRHIRef时,触发崩溃;
- 在函数返回时,对象正在Deleting(),并在函数返回后标记DeletingBit,这时返回的对象变为野指针
正常的逻辑,应该是在标记为delete状态时,不应该再从GraphicsPSOLockedMap中找到且不能影响到新的PSO的创建。
崩溃 2 堆栈:
Scudo ERROR: invalid chunk state when deallocating address 0x2000075f7980910
#19 pc 000000000063b230 /vendor/lib64/libllvm-qgl.so (CreateQGLCProgram(QGPUCompiler::CompileData*)+48) [arm64-v8a]
#20 pc 0000000000db9f00 /vendor/lib64/libllvm-qgl.so [arm64-v8a]
#21 pc 0000000000db9b78 /vendor/lib64/libllvm-qgl.so [arm64-v8a]
#22 pc 0000000000040d4c /vendor/lib64/libllvm-glnext.so [arm64-v8a]
#23 pc 00000000001aec9c /vendor/lib64/hw/vulkan.adreno.so [arm64-v8a]
#24 pc 00000000001aad8c /vendor/lib64/hw/vulkan.adreno.so [arm64-v8a]
#25 pc 00000000001a9190 /vendor/lib64/hw/vulkan.adreno.so (qglinternal::vkCreateGraphicsPipelines(VkDevice_T*, VkPipelineCache_T*, unsigned int, VkGraphicsPipelineCreateInfo const*, VkAllocationCallbacks const*, VkPipeline_T**)+7008) [arm64-v8a]
#26 pc 00000000198df1dc /data/app/~~9jDzNJpuYBYxa3L7XQLGjQ==/com.tencent.mf.nf-szbeKqDFipD2HrdT4Hgbvg==/lib/arm64/libUnreal.so (FastDecimalFormat::Internal::FDecimalNumberSignParser::~FDecimalNumberSignParser()+148) [arm64-v8a]
#27 pc 00000000197e82e0 /data/app/~~9jDzNJpuYBYxa3L7XQLGjQ==/com.tencent.mf.nf-szbeKqDFipD2HrdT4Hgbvg==/lib/arm64/libUnreal.so (VkResult FVulkanPipelineCacheChunk::CreatePSO<FVulkanRHIGraphicsPipelineState>(FVulkanRHIGraphicsPipelineState*, FVulkanPipelineCacheChunk::EPSOCacheFindResult, TUniqueFunction<VkResult(FVulkanChunkedPipelineCacheManager::FPSOCreateFuncParams<FVulkanRHIGraphicsPipelineState>&)>)+1196) [arm64-v8a]
#28 pc 00000000197d537c /data/app/~~9jDzNJpuYBYxa3L7XQLGjQ==/com.tencent.mf.nf-szbeKqDFipD2HrdT4Hgbvg==/lib/arm64/libUnreal.so (VkResult FVulkanChunkedPipelineCacheManagerImpl::CreatePSO<FVulkanRHIGraphicsPipelineState>(FVulkanRHIGraphicsPipelineState*, bool, TUniqueFunction<VkResult(FVulkanChunkedPipelineCacheManager::FPSOCreateFuncParams<FVulkanRHIGraphicsPipelineState>&)>)+316) [arm64-v8a]
在首次启动时会稳定重现,根据错误类型来看,属于Scudo内存分配器多次释放内存导致,极有可能是UE中对PSO对象进行了多次释放导致;由于Vulkan层的PSO的管理是多线程机制,所以怀疑和多线程有关;尝试把 Threadpool 调小或者设置 r.Vulkan.ForcePSOSingleThreaded=1 都可以使该问题不再重现。
想问下这是否是已知问题?如果要修改建议如何修改?
[Attachment Removed]