We’re seeing a fairly infrequent crash in our Google Play that seems to occur in the chunked PSO precache. I had turned on precaching for Android in 5.4 using the information in this question [Content removed] I included the Mali callstack as it seemed to have more useful driver-level frames than other ones, but I’ve also seen it for Adreno and PowerVR drivers.
Turning on precaching was an opportunistic thing as it was a noticeable improvement even though 5.4 doesn’t have full PSO precaching coverage. However, those crash reports coming in are pointing at something more. Is there a known crash in 5.4 that looks like this?
Steps to Reproduce
Enable PSO precaching for 5.4 by adding this to AndroidEngine.ini:
[ConsoleVariables] r.Vulkan.AllowPSOPrecaching=1 r.Vulkan.UseChunkedPSOCache=1
Hi Camille,
I might be missing something, but I can’t seem to find the Mali callstack you mention?
JN
Bonjour Jean-Noé,
That’s weird! There was a callstack section in when writing the question. Here you go:
#00 pc 0x0000000001dd0d0c /vendor/lib64/egl/libGLES_mali.so (bool llvm::DenseMapBase<llvm::DenseMap<llvm::BranchProbabilityInfo::BasicBlockCallbackVH, llvm::detail::DenseSetEmpty, llvm::DenseMapInfo<llvm::Value*>, llvm::detail::DenseSetPair<llvm::BranchProbabilityInfo::BasicBlockCallbackVH>>, llvm::BranchProbabilityInfo::BasicBlockCallbackVH, llvm::detail::DenseSetEmpty, llvm::DenseMapInfo<llvm::Value*>, llvm::detail::DenseSetPair<llvm::BranchProbabilityInfo::BasicBlockCallbackVH>>::LookupBucketFor<llvm::BranchProbabilityInfo::BasicBlockCallbackVH>(llvm::BranchProbabilityInfo::BasicBlockCallbackVH const&, llvm::detail::DenseSetPair<llvm::BranchProbabilityInfo::BasicBlockCallbackVH> const*&) const) #01 pc 0x0000000001dd0c5c /vendor/lib64/egl/libGLES_mali.so (llvm::DenseMapBase<llvm::DenseMap<llvm::BranchProbabilityInfo::BasicBlockCallbackVH, llvm::detail::DenseSetEmpty, llvm::DenseMapInfo<llvm::Value*>, llvm::detail::DenseSetPair<llvm::BranchProbabilityInfo::BasicBlockCallbackVH>>, llvm::BranchProbabilityInfo::BasicBlockCallbackVH, llvm::detail::DenseSetEmpty, llvm::DenseMapInfo<llvm::Value*>, llvm::detail::DenseSetPair<llvm::BranchProbabilityInfo::BasicBlockCallbackVH>>::erase(llvm::BranchProbabilityInfo::BasicBlockCallbackVH const&)+44) #02 pc 0x00000000021eba80 /vendor/lib64/egl/libGLES_mali.so (llvm::ValueHandleBase::ValueIsDeleted(llvm::Value*)+536) #03 pc 0x00000000021ebf10 /vendor/lib64/egl/libGLES_mali.so (llvm::Value::~Value()+40) #04 pc 0x0000000002150290 /vendor/lib64/egl/libGLES_mali.so (llvm::BasicBlock::eraseFromParent()+96) #05 pc 0x00000000021a7d08 /vendor/lib64/egl/libGLES_mali.so (llvm::Function::dropAllReferences()+96) #06 pc 0x00000000021e40a8 /vendor/lib64/egl/libGLES_mali.so (llvm::Module::dropAllReferences()+40) #07 pc 0x00000000021e3e08 /vendor/lib64/egl/libGLES_mali.so (llvm::Module::~Module()+40) #08 pc 0x00000000009e8610 /vendor/lib64/egl/libGLES_mali.so (cmpbep_destroy_llvm_context+32) #09 pc 0x0000000000accc10 /vendor/lib64/egl/libGLES_mali.so (cmpbep_destroy_shader_context+16) #10 pc 0x0000000000a05108 /vendor/lib64/egl/libGLES_mali.so (cmpbe_v2_compile_multiple_shaders+5000) #11 pc 0x00000000016bcb50 /vendor/lib64/egl/libGLES_mali.so (gfx::compiler::compile_shaders(gfx::shader_set const&, gfx::shader_set&, hal::shader_language, gfx::shader_state const&, gfx::pipeline_cache*, gfx::mem_allocator&)+6768) #12 pc 0x00000000006ef638 /vendor/lib64/egl/libGLES_mali.so (vulkan::graphics_pipeline::init(vulkan::device*, VkGraphicsPipelineCreateInfo const&, gfx::host_mem_allocator const&, gfx::host_mem_allocator&)+5384) #13 pc 0x00000000006ee0b0 /vendor/lib64/egl/libGLES_mali.so (vkCreateGraphicsPipelines+472) #14 pc 0x0000000007f19528 FVulkanPipelineStateCacheManager::CreateVKPipeline(FVulkanRHIGraphicsPipelineState*, FVulkanShader**, VkGraphicsPipelineCreateInfo const&, bool)::$_15::operator()(FVulkanChunkedPipelineCacheManager::FPSOCreateFuncParams<FVulkanRHIGraphicsPipelineState>&) const [C:/ws/Prod-AndroidClient/Engine/Source/./Runtime/VulkanRHI/Private/VulkanPipeline.cpp:0] #15 pc 0x0000000007eae63c UE::Core::Private::Function::TFunctionRefBase<UE::Core::Private::Function::TFunctionStorage<true>, VkResult (FVulkanChunkedPipelineCacheManager::FPSOCreateFuncParams<FVulkanRHIGraphicsPipelineState>&)>::operator()(FVulkanChunkedPipelineCacheManager::FPSOCreateFuncParams<FVulkanRHIGraphicsPipelineState>&) const [C:/ws/Prod-AndroidClient/Engine/Source/Runtime/Core/Public/Templates/Function.h:555] #16 pc 0x0000000007ea22c4 VkResult FVulkanChunkedPipelineCacheManagerImpl::CreatePSO<FVulkanRHIGraphicsPipelineState>(FVulkanRHIGraphicsPipelineState*, bool, TUniqueFunction<VkResult (FVulkanChunkedPipelineCacheManager::FPSOCreateFuncParams<FVulkanRHIGraphicsPipelineState>&)>) [C:/ws/Prod-AndroidClient/Engine/Source/./Runtime/VulkanRHI/Private/VulkanChunkedPipelineCache.cpp:932] #17 pc 0x0000000007ea2170 VkResult FVulkanChunkedPipelineCacheManager::CreatePSO<FVulkanRHIGraphicsPipelineState>(FVulkanRHIGraphicsPipelineState*, bool, TUniqueFunction<VkResult (FVulkanChunkedPipelineCacheManager::FPSOCreateFuncParams<FVulkanRHIGraphicsPipelineState>&)>) [C:/ws/Prod-AndroidClient/Engine/Source/./Runtime/VulkanRHI/Private/VulkanChunkedPipelineCache.cpp:1237] #18 pc 0x0000000007ee9084 FVulkanPipelineStateCacheManager::CreateVKPipeline(FVulkanRHIGraphicsPipelineState*, FVulkanShader**, VkGraphicsPipelineCreateInfo const&, bool) [C:/ws/Prod-AndroidClient/Engine/Source/./Runtime/VulkanRHI/Private/VulkanPipeline.cpp:1454] #19 pc 0x0000000007ee8d50 FVulkanPipelineStateCacheManager::CreateGfxPipelineFromEntry(FVulkanRHIGraphicsPipelineState*, FVulkanShader**, bool) [C:/ws/Prod-AndroidClient/Engine/Source/./Runtime/VulkanRHI/Private/VulkanPipeline.cpp:1380] #20 pc 0x0000000007eeb448 FVulkanPipelineStateCacheManager::RHICreateGraphicsPipelineState(FGraphicsPipelineStateInitializer const&) [C:/ws/Prod-AndroidClient/Engine/Source/./Runtime/VulkanRHI/Private/VulkanPipeline.cpp:2143] #21 pc 0x00000000021e20b8 RHICreateGraphicsPipelineState(FGraphicsPipelineStateInitializer const&) [C:/ws/Prod-AndroidClient/Engine/Source/Runtime/RHI/Public/DynamicRHI.h:1121] #22 pc 0x00000000021e29dc UE::Core::Private::Function::TFunctionRefBase<UE::Core::Private::Function::TFunctionStorage<true>, void ()>::operator()() const [C:/ws/Prod-AndroidClient/Engine/Source/Runtime/Core/Public/Templates/Function.h:555] #23 pc 0x000000000180da2c FAsyncTaskBase::DoWork() [C:/ws/Prod-AndroidClient/Engine/Source/Runtime/Core/Public/Async/AsyncWork.h:288] #24 pc 0x000000000182fa90 FQueuedThread::Run() [C:/ws/Prod-AndroidClient/Engine/Source/./Runtime/Core/Private/HAL/ThreadingBase.cpp:1385] #25 pc 0x0000000001828c4c FRunnableThreadPThread::Run() [C:/ws/Prod-AndroidClient/Engine/Source/./Runtime/Core/Private/HAL/PThreadRunnableThread.cpp:25] #26 pc 0x00000000017a5ef8 FRunnableThreadPThread::_ThreadProc(void*) [C:/ws/Prod-AndroidClient/Engine/Source/./Runtime/Core/Private/HAL/PThreadRunnableThread.h:187] #27 pc 0x000000000006b1f0 /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+196) #28 pc 0x000000000005e1b4 /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+64)
Thanks! The stack doesn’t look familiar to me…
Maybe pipeline caching is now creating a pipeline that wasn’t being used before, and the driver crashes while trying to build it.
So you’re saying the crash happens on a variety of devices (any GPU type), with both old and recent drivers? Do you have any logs you could share?
I’ll check with our internal mobile teams to see if they have any ideas.
JN
Also, just to be sure, you don’t have a repro for this locally that you can run with validation enabled?
The Adreno version is slightly different, happening through an assert on return of the API. Not sure which assert because Google Play console doesn’t provide that much detail, and I’m battling an issue with our crash reporter not collecting all crash reports. For the same reason, I don’t currently have logs for this, but I’ll get our QA to try and hammer on this with validation layers enabled.
The Adreno callstack:
`*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
pid: 0, tid: 10582 >>> com.netflix.NGP.SpiritCrossing <<<
backtrace:
#00 pc 0x0000000000060100 /apex/com.android.runtime/lib64/bionic/libc.so (abort+172)
#01 pc 0x000000000177e1fc /data/app/~~Rqs864v-bTiTKjjjJURQwQ==/com.netflix.NGP.SpiritCrossing-UoKak9RY8LW_HORU7T3trg==/split_config.arm64_v8a.apk!libUnreal.so (FAndroidErrorOutputDevice::Serialize(char16_t const*, ELogVerbosity::Type, FName const&)+4096) (BuildId: eee71321bd63d843e4de10d61431643f6b7ab25a)
#02 pc 0x0000000001961948 /data/app/~~Rqs864v-bTiTKjjjJURQwQ==/com.netflix.NGP.SpiritCrossing-UoKak9RY8LW_HORU7T3trg==/split_config.arm64_v8a.apk!libUnreal.so (FOutputDevice::LogfImpl(char16_t const*, …)+4096) (BuildId: eee71321bd63d843e4de10d61431643f6b7ab25a)
#03 pc 0x00000000018f7d08 /data/app/~~Rqs864v-bTiTKjjjJURQwQ==/com.netflix.NGP.SpiritCrossing-UoKak9RY8LW_HORU7T3trg==/split_config.arm64_v8a.apk!libUnreal.so (FDebug::CheckVerifyFailedImpl2(char const*, char const*, int, char16_t const*, …)+4096) (BuildId: eee71321bd63d843e4de10d61431643f6b7ab25a)
#04 pc 0x0000000007eeaf40 /data/app/~~Rqs864v-bTiTKjjjJURQwQ==/com.netflix.NGP.SpiritCrossing-UoKak9RY8LW_HORU7T3trg==/split_config.arm64_v8a.apk!libUnreal.so (FVulkanPipelineStateCacheManager::RHICreateGraphicsPipelineState(FGraphicsPipelineStateInitializer const&)+4096) (BuildId: eee71321bd63d843e4de10d61431643f6b7ab25a)
#05 pc 0x00000000021e20b8 /data/app/~~Rqs864v-bTiTKjjjJURQwQ==/com.netflix.NGP.SpiritCrossing-UoKak9RY8LW_HORU7T3trg==/split_config.arm64_v8a.apk!libUnreal.so (FCompilePipelineStateTask::CompilePSO()+4096) (BuildId: eee71321bd63d843e4de10d61431643f6b7ab25a)
#06 pc 0x00000000021e29dc /data/app/~~Rqs864v-bTiTKjjjJURQwQ==/com.netflix.NGP.SpiritCrossing-UoKak9RY8LW_HORU7T3trg==/split_config.arm64_v8a.apk!libUnreal.so (FPSOPrecompileTask::DoWork()+4096) (BuildId: eee71321bd63d843e4de10d61431643f6b7ab25a)
#07 pc 0x000000000180da2c /data/app/~~Rqs864v-bTiTKjjjJURQwQ==/com.netflix.NGP.SpiritCrossing-UoKak9RY8LW_HORU7T3trg==/split_config.arm64_v8a.apk!libUnreal.so (FAsyncTaskBase::DoThreadedWork()+4096) (BuildId: eee71321bd63d843e4de10d61431643f6b7ab25a)
#08 pc 0x000000000182fa90 /data/app/~~Rqs864v-bTiTKjjjJURQwQ==/com.netflix.NGP.SpiritCrossing-UoKak9RY8LW_HORU7T3trg==/split_config.arm64_v8a.apk!libUnreal.so (FQueuedThread::Run()+4096) (BuildId: eee71321bd63d843e4de10d61431643f6b7ab25a)
#09 pc 0x0000000001828c4c /data/app/~~Rqs864v-bTiTKjjjJURQwQ==/com.netflix.NGP.SpiritCrossing-UoKak9RY8LW_HORU7T3trg==/split_config.arm64_v8a.apk!libUnreal.so (FRunnableThreadPThread::Run()+4096) (BuildId: eee71321bd63d843e4de10d61431643f6b7ab25a)
#10 pc 0x00000000017a5ef8 /data/app/~~Rqs864v-bTiTKjjjJURQwQ==/com.netflix.NGP.SpiritCrossing-UoKak9RY8LW_HORU7T3trg==/split_config.arm64_v8a.apk!libUnreal.so (FRunnableThreadPThread::_ThreadProc(void*)+4096) (BuildId: eee71321bd63d843e4de10d61431643f6b7ab25a)
#11 pc 0x0000000000071ae8 /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+196)
#12 pc 0x0000000000063be0 /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+68)`* The stack doesn’t look familiar to me…
Okay, that’s already good to know! We’re about to merge 5.5 shortly, so I might we do that to confirm whether it’s still happening, just in case it does impact PSO precaching.
OK, this stack looks like it would have outputed the cause of the error in the logs, it’s failing an assert on our side. If you ever get your hands on some logs, we might be able to narrow the cause.
Upgrading to 5.5 might fix it too. 
I had a quick look at the history, I saw this LRU fix that might cause asserts: https://github.com/EpicGames/UnrealEngine/commit/1e350c23e77a347ff2e781e794828bf05ee09588
Keep me posted!
JN
5.5 seems to help, but it is still only an internal branch and we don’t have the volume that our public test does.
As an aside, maybe I should branch this into a separate question but do you know if GPU Profiler 2.0 could be feasibly brought over to 5.5? It looks like all the work to make it run on Android Vulkan occurred for 5.6.
Glad to hear it!
Yes, most of the code for Vulkan went in for 5.6. It would be feasible to bring into 5.5, but not trivial. You would need all the Vulkan side of things, that part shouldn’t be too bad. I’d also update the profiler code itself, since fixes went in for it while it was being deployed permanently for 5.6... Those fixes might have repercussions on other platforms for which you’d need to pull code for other platforms too (depending on how many platforms you’re interested in, this might be more or less work).
If you do attempt it, let me know (in this ticket or another)… we have fixes that went in for it in 5.6.1 if I’m not mistaken, and another just last week.
Cheers,
JN