Crashes when mounting and unmounting IoStore containers in different threads

Hi!

In our game we have custom downloadable content system based on pak files. Recently I was upgrading it to use IoStore. And everything went great until we noticed that on Mac the game crashes with many different seemingly random callstacks after an IoStore container is unmounted. It looked very much like a memory issue so I used address sanitizer and found heap-use-after-free error at FFileHandleApple::ReserveSlot, the same as was already reported at [Content removed] [Content removed] [Content removed]

After some investigation I figured what specifically happens: when IoStore container is mounted its file handle is not closed and instead is stored until is unmounted. But in our downloadable content system mounting is performed in background thread and unmounting in game thread. And as I can see from implementation of FFileHandleApple a file handle can only be created, read from and freed in the same thread, otherwise one gets heap-use-after-free.

I’ve resolved the issue simply by moving mounting to game thread on Mac for now. But current implementation of FFileHandleApple looks highly unsafe to me and needs proper fix.

Hi Hugo,

Thank you for response.

Regarding GameThreadCall: no, we heavily use task graph in our dlc system, so just changed target thread of mount task to ENamedThreads::GameThread on Mac.

Regarding iOS: no, we don’t have any similar memory issues on iOS. As I see from iOS file handle implementation it stores pointer to a thread-local array of handles inside a handle during handle creation and uses it later for all operations instead of handle array of thread on which operation is performed. It seems to be not safe with respect to data races but at least it doesn’t mess with memory.

Also an update: actually I’ve found that moving container mounting to game thread did not help completely. In a build with patches I get heap-use-overflow address sanitizer issue at FFileHandleApple::ReserveSlot and crash at some point after it simply during mounting all the containers. As I understand it happens because pak file handles are being created at different threads during loading of specific files but are freed mostly from game thread at FPakFile::ReleaseOldReaders. It doesn’t lead to memory issues unless we don’t run out of free file handles (which number is limited to 192 in FFileHandleApple), but almost immediately breaks memory after it. In a build with patches we have about 290 IoStore containers (we have about 70 chunks atm but get x2 from optional chunks and another x2 from patches) so we run out of free file handles during mounting.

In this case I don’t see how it could be fixed without fixing FFileHandleApple. So I tried to move implementation of thread-local handle arrays from iOS file handle to Mac file handle and it actually helped. But as I mentioned earlier I expect other issues may arise.

So we would really appreciate if somebody from Epic could have a look into issue and come up with proper fix for file handles on Mac and probably iOS too.

Hi Ilya,

Thank you for reporting this issue with all the details of your findings.

This will greatly help us to address the issue.

I’m curious about your fix implementation, did you use GameThreadCall?

On Mac, we have a separation between GameThread & MainThread that does not exist on other platforms.

The role of MainThread is to process these events & execute MainThreadCall which can be executed sync or async.

We have this separation because the OS wants us to actively process events, and if we do not do so fast enough, it will kill the engine.

It’s kinda like the “App stop responding on Windows” but less forgiving.

Did you also see that issue on iOS?

Thanks,

Regards,

Hugo Lamarche

Hi Ilya,

Thanks for the additional details.

I read in one of the forum posts you shared that someone resolved their issues by adapting the Unix implementation for Mac.

This might be the way we go in the future… It’s too soon to say, but I’m going to tackle it for an upcoming engine version.

Maybe that could be a short-term solution for your game.

Regards,

Hugo Lamarche

Hi Ilya,

I’ve submitted today the refactoring to use the same mechanism as the Unix implementation on Mac.

This is the commit: https://github.com/EpicGames/UnrealEngine/commit/6bfcbc2833aa29dcb6599ad2a6f36a325ef181d3

Would it be possible for you to test it and tell me if this is solving your issue, please?

The new implementation should be thread-safe.

Thanks,

Regards,

Hugo

Hi Hugo,

Thank you! I’ve already implemented kinda fix myself - simply removed thread_local and sprinkled some locks around - and it seems to work. But I will test your fix too.

Best regards,

Ilya

Hi Ilya,

I’m glad you found a way to fix it yourself. Please let me know if you have any issues with my fix as well, or if it’s helping.

Thanks,

Regards,

Hugo