Vulkan在部分安卓机型上内存泄漏问题咨询

[Content removed]

你好,我针对这个post做一下补充(这个问题close了重新开一个),我们在5.6.1的引擎上打了一个安卓shipping包,在poco f5上重新做了一次测试,发现内存泄漏的问题仍然存在

[Image Removed]

我们就5.4的工程和小米那边沟通过,小米那边和高通沟通过之后给了以下的回答,和你们同步一下:

From the KGSL traces, I can observe that the application renderthread is only allocating the memory but not freeing it -

<br/>

56978.841622: kgsl_mem_mmap: useraddr=0x70a18c1000 gpuaddr=0x40001d3000 size=4096 usage=any(0) id=427 flags=0xc0000

56978.841699: kgsl_mem_mmap: useraddr=0x70a18c1000 gpuaddr=0x40000b6000 size=4096 usage=any(0) id=296 flags=0xc0000

56978.841980: kgsl_mem_mmap: useraddr=0x70a13bc000 gpuaddr=0x40000b8000 size=4096 usage=gl id=338 flags=0xc0900

56978.842037: kgsl_mem_mmap: useraddr=0x70a13bc000 gpuaddr=0x40000ac000 size=4096 usage=any(0) id=49 flags=0xc0000

56978.842228: kgsl_mem_mmap: useraddr=0x70a13b9000 gpuaddr=0x40000d2000 size=4096 usage=texture id=316 flags=0xc0600

56978.842283: kgsl_mem_mmap: useraddr=0x70a13b9000 gpuaddr=0x40000d9000 size=4096 usage=gl id=343 flags=0xc0900

56978.842514: kgsl_mem_mmap: useraddr=0x70a1365000 gpuaddr=0x40000d8000 size=4096 usage=any(0) id=337 flags=0xc0000

56978.842572: kgsl_mem_mmap: useraddr=0x70a1365000 gpuaddr=0x40000cb000 size=4096 usage=texture id=292 flags=0xc0600

56994.002996: kgsl_mem_mmap: useraddr=0x70a18c1000 gpuaddr=0x40001d3000 size=4096 usage=any(0) id=427 flags=0xc0000

56994.003343: kgsl_mem_mmap: useraddr=0x70a18bb000 gpuaddr=0x40000d8000 size=4096 usage=any(0) id=337 flags=0xc0000

56994.003488: kgsl_mem_mmap: useraddr=0x70a18bb000 gpuaddr=0x40000d2000 size=4096 usage=texture id=316 flags=0xc0600

<br/>

For every kgsl_mem_mmap, there should be a corresponding kgsl_mem_free. // You can capture the KGSL traces for any other usecase and confirm the same.

<br/>

So, My suspsicion is that application might calling the memory free APIs to free up the memory or the memory free APIs might be called at a later point of time which is exceeding your usecase profiling duration.

<br/>

To summarize, GPU will only ne responsible to free/allocate the memory as per application request. If the application doesnot free the memory, GPU will not be explicitly freeing the application memory.

看起来像是猜测说内存的回收可能是在某个条件下统一进行,在5.6.1的项目中进行测试大概录制了10分钟,造成了16mb(图中的129331200,猜测单位是bit)的内存没有回收。

想请问下有没有可能是因为内存回收的滞后导致的这个现象?还是说可能有其他的原因导致的泄露?

我们会针对手头其他的机型接着做其他的测试,看下是否其他型号的GPU能够复现。

Steps to Reproduce

Hi,

感谢提供信息,你能否试一下设置 r.Vulkan.MemoryMapChunkedPSOCache 0,看看是否能规避这个问题? 如果有效果的话,应该就是FVulkanCombinedChunkCacheFile.UpdateMapping 函数的问题,需要先UnMap,然后再MapRegion。

你好,我把这个变量关掉之后重新测了一下10min,看起来确实是和这个有关系,内存剩余显著下降

[Image Removed]但是我看IMappedFileRegion里面并没有提供对应的unmapRegion的接口,请问这个官方有相关的changelist(但是我翻了下5.7好像也没有修改这里),或者是有相关的接口吗?

另外就是关闭之后前几次打开游戏cache PSO的时候会崩溃在UpdateMapping里面,需要重启几次游戏才能正常进入到游戏中,堆栈如下

FVulkanCombinedChunkCacheFile::UpdateMapping(unsigned int)

F:/UE561_20251007/UnrealEngine-release/Engine/Source/./Runtime/VulkanRHI/Private/VulkanChunkedPipelineCache.cpp:213

FVulkanCombinedChunkCacheFile::FlushWriteHandle()

F:/UE561_20251007/UnrealEngine-release/Engine/Source/./Runtime/VulkanRHI/Private/VulkanChunkedPipelineCache.cpp:426

VkResult FVulkanPipelineCacheChunk::CreatePSO<FVulkanRHIGraphicsPipelineState>(FVulkanRHIGraphicsPipelineState*, FVulkanPipelineCacheChunk::EPSOCacheFindResult, TUniqueFunction<VkResult (FVulkanChunkedPipelineCacheManager::FPSOCreateFuncParams<FVulkanRHIGraphicsPipelineState>&)>)

F:/UE561_20251007/UnrealEngine-release/Engine/Source/./Runtime/VulkanRHI/Private/VulkanChunkedPipelineCache.cpp:587

VkResult FVulkanChunkedPipelineCacheManagerImpl::CreatePSO<FVulkanRHIGraphicsPipelineState>(FVulkanRHIGraphicsPipelineState*, bool, TUniqueFunction<VkResult (FVulkanChunkedPipelineCacheManager::FPSOCreateFuncParams<FVulkanRHIGraphicsPipelineState>&)>)

F:/UE561_20251007/UnrealEngine-release/Engine/Source/./Runtime/VulkanRHI/Private/VulkanChunkedPipelineCache.cpp:934

VkResult FVulkanChunkedPipelineCacheManager::CreatePSO<FVulkanRHIGraphicsPipelineState>(FVulkanRHIGraphicsPipelineState*, bool, TUniqueFunction<VkResult (FVulkanChunkedPipelineCacheManager::FPSOCreateFuncParams<FVulkanRHIGraphicsPipelineState>&)>)

F:/UE561_20251007/UnrealEngine-release/Engine/Source/./Runtime/VulkanRHI/Private/VulkanChunkedPipelineCache.cpp:1239

FVulkanPipelineStateCacheManager::CreateVKPipeline(FVulkanRHIGraphicsPipelineState*, FVulkanShader**, VkGraphicsPipelineCreateInfo const&, FGraphicsPipelineStateInitializer::EPSOPrecacheCompileType)

F:/UE561_20251007/UnrealEngine-release/Engine/Source/./Runtime/VulkanRHI/Private/VulkanPipeline.cpp:1470

FVulkanPipelineStateCacheManager::CreateGfxPipelineFromEntry(FVulkanRHIGraphicsPipelineState*, FVulkanShader**, FGraphicsPipelineStateInitializer::EPSOPrecacheCompileType)

F:/UE561_20251007/UnrealEngine-release/Engine/Source/./Runtime/VulkanRHI/Private/VulkanPipeline.cpp:1393

FVulkanPipelineStateCacheManager::RHICreateGraphicsPipelineState(FGraphicsPipelineStateInitializer const&)

F:/UE561_20251007/UnrealEngine-release/Engine/Source/./Runtime/VulkanRHI/Private/VulkanPipeline.cpp:2179

FVulkanDynamicRHI::RHICreateGraphicsPipelineState(FGraphicsPipelineStateInitializer const&)

F:/UE561_20251007/UnrealEngine-release/Engine/Source/./Runtime/VulkanRHI/Private/VulkanPipeline.cpp:2237

RHICreateGraphicsPipelineState(FGraphicsPipelineStateInitializer const&)

F:/UE561_20251007/UnrealEngine-release/Engine/Source/Runtime/RHI/Public/DynamicRHI.h:1189

operator()

F:/UE561_20251007/UnrealEngine-release/Engine/Source/./Runtime/RHI/Private/PipelineStateCache.cpp:3894

UE::Core::Private::Function::TFunctionRefBase<UE::Core::Private::Function::TFunctionStorage<true>, void (FPSOPrecacheAsyncTask const*)>::operator()(FPSOPrecacheAsyncTask const*) const

F:/UE561_20251007/UnrealEngine-release/Engine/Source/Runtime/Core/Public/Templates/Function.h:471

FAsyncTaskBase::DoWork()

F:/UE561_20251007/UnrealEngine-release/Engine/Source/Runtime/Core/Public/Async/AsyncWork.h:288

FAsyncTaskBase::DoThreadedWork()

F:/UE561_20251007/UnrealEngine-release/Engine/Source/Runtime/Core/Public/Async/AsyncWork.h:312

FQueuedThread::Run()

F:/UE561_20251007/UnrealEngine-release/Engine/Source/./Runtime/Core/Private/HAL/ThreadingBase.cpp:1457

FRunnableThreadPThread::Run()

F:/UE561_20251007/UnrealEngine-release/Engine/Source/./Runtime/Core/Private/HAL/PThreadRunnableThread.cpp:25

补充:并且在崩溃报错之前有Sahder Compilation Fail的warning [Image Removed]

好的,感谢提供信息,UnMap没有暴露出来,而是放在了Android的实现里:FAndroidMappedFileHandle.UnMap,UnMap是在FAndroidMappedFileRegion析构函数里调用的,所以我怀疑是先调用了两次mmap,然后调用了unmap,导致了一些问题。我尝试改了一下,你可以试试看看是否有效(对比或者覆盖原生5.6版的AndroidPlatformFile.cpp)。

(注意patch里的"-"代表的内容是新增的)

你好,我按照patch里面的修改之后重新打了个包,看起来还是有内存泄漏的问题,请问需要我把测试用的包顺便贴上来吗?

[Image Removed]

抱歉,我上面的改动有点复杂,能试一下这么改吗?

​ [Image Removed]

你好,我昨天试过直接改成:MappedRegion.Reset(MappedCacheFile->MapRegion(0,size))打了一个包测了一下也还是会有泄露的情况

你能试一下我上面的改法吗?你的该法好像还是会先调用mmmap。

另外你方便把FAndroidMappedFileHandle.UnMap 里的LOG_ANDROID_FILE宏去掉吗,我想确认一下这个Unmap是否被执行到了。

好的我试下,这个改动是需要回退早上AndroidPlatformFile的相关修改对吧

是的,把我之前的改动都回退了。

你好,我直接调用reset()并且把LOG_ANDROID_FILE宏注掉试了下,还是会有泄露的情况。我尝试调了下

[Image Removed]这个地方测试包只调用了一次,此时的MappedRegion是空的,应该是不会走到析构里面,并且在logcat里面搜了下也没有unmap region相关的log

但是我试了一下,设置 r.Vulkan.MemoryMapChunkedPSOCache 0,并且在这里判个空,就不会启动崩溃,并且看起来也不会有内存泄漏的情况,请问这是一个可行的workaround吗?

[Image Removed]

好的,如果你们不着急可以先这样用,我把情况反馈给相关的同事,看看怎么解决。

想在确认一下情况,也就是说UpdateMapping整个过程之调用过一次,所以只会mmap一次,所以内存只是这里增加了一次,不会持续增长,是吧?如果是这种情况应该不算内存泄漏吧?

我们不着急,我们尝试下在当前项目这么改试下,因为现在还在5.4也不太确定和5.6的测试工程的情况一不一样,只能说先试试。

不是的,测试包里面的情况是updateMapping整个过程只调用了一次,因此从这里调用的mmap应该是只有一次,但是从AndroidStudio那边抓到的情况来看即使是只调用一次还是有内存泄露的情况,不太确定泄露的来源是不是这里(不过确实关闭ChunkedPSOCache之后泄露就消失了)

你好,我今天重新测了一下5.4的情况,很遗憾好像这个改动在5.4上不起作用,但因为我们最近可能会有升5.6的计划,我先回去同步一下。另外就是麻烦后续如果有相关的修改的话,能麻烦同步一下吗?感谢

好的,一定的。今天同事刚刚联系我,他目前正在分析问题,但是有些不确定,所以想请你在5.6上帮忙验证一下,他加了两个check,看看会不会出问题,感谢。

[Image Removed]

你好,开启MemoryMapChunkedPSOCache并且把这两个check加进去之后我重新打了个包,正常进入到了场景没有触发check

好的,非常感谢,我想问一下5.6中复现这个问题需要什么步骤吗?比如我直接创建一个ThirdPersonTempalteBP的工程,打包Android就可以复现吗?

如果需要一些配置,能否直接提供一个复现工程,我发给相关的同事测试一下,这样可能效率更高一些,感谢。