The issue is that when compiler processes are exiting due to low memory availability, UBA’s Session::ProcessExited function is called. This function *also* attempts to allocate memory, through the m_deadProcesses.emplace_back call. Underneath, mi_malloc_aligned can return nullptr when there is no memory available, and this causes UBA to crash with a segfault as the std container doesn’t get memory allocated for it.
It was mentioned to me that one of the strategies the engine itself uses to deal with extremely low memory scenarios is to preemptively allocate 32MB of memory, and to release that if a memory allocation ever fails to ensure memory requests can succeed. It’s likely that UBA needs to employ a similar mechanism, or to preemptively allocate enough memory for all of it’s “general allocation requests redirected to mimalloc”, so that the general UBA code won’t encounter memory allocation failures even if the rest of the system is under memory pressure due to compiler processes or mmap’d memory.
Run UBA agent in an environment where there is no swap space. The environment I can reproduce this in is running the UBA agent under Wine (with patches from the EpicGamesExt/WineResources repository), on a machine with 16 cores, 64 GB of RAM and no swap space. However, even though I’m running this under Wine, this issue likely affects real Windows machines that have no pagefile as well.
Run a build such that the UBA agent is building several large engine unity files (e.g. Engine.10.cpp)
We ran into something similar [Content removed] No fix yet, other than to have more memory. We weren’t using the patches from EpicGamesExt/WineResources though.
Have you seen swap+wine+UBA work on a machine with 16 cores and 64GB of RAM?
Unfortunately I can’t turn swap on in my environment (GKE Autopilot), so I can’t test whether it works there. I’m almost certain it does though.
I think the root cause of this is the way that MSVC relies on having a pagefile - even on Windows with a large pagefile, you’ll still get MSVC pagefile errors when compiling large C++ files, and this issue is an artifact of MSVC just relying on a pagefile being present. Notably, using Clang as the compiler for Win64 (under Wine/Linux with no swapfile) does not exhibit the same out-of-memory issues, likely because that compiler is written to also run on Linux where no swapfile means you have a hard OOM killer in effect at all times.
My next strategies here for making things work are:
As a quick hack, modifying the memory-shim to make all mmap’d regions backed by a file on disk. I’ve no idea if the Linux kernel will do the right thing and page out regions to the file on disk when under memory pressure (effectively making the file-backed mmap’d region act like a swap file just for that region). I’ll likely need to statically link mimalloc into memory-shim as well so that malloc/free can be routed to a file-backed mmap region as well. I do worry about the performance implications though - especially since with this model the Linux kernel will think persisting the data to file is important, when in reality we just want it there as a last resort of storage if the memory limit is being hit.
Using the userfaultfd API of the Linux kernel. This is available in constrained environments like GKE Autopilot, and lets user space handle faulting in pages of anonymous mmap’d regions. The API is not that well documented, but I think I’ve wrapped my head around what it needs. It does allow passing the userfaultfd file descriptor back to a central process for faulting in pages (i.e. a central process can track the overall resident memory for all downstream child processes). The downside of this API is that it requires the target process to cooperate in releasing memory from the anonymous mmap - the only way to evict a page with userfaultfd is for the process that has the mmap’d region to use madvise(DONTNEED) - you can’t do it from the central process. So there’s a bit of awkward IPC needed when the central process identifies that a page needs to be evicted to disk to allow another page to be brought into memory.
Of course, I think the long term solution here would be for UBA to use the mimalloc API to back memory allocations with mmap’d files on disk whenever a compiler process goes above it’s “share” of the memory. This would probably eliminate the “please increase your pagefile” errors on Windows as well.
Unfortunately some console platforms have editor deployment integrations that are not compatible with Clang, and get compiled out when Clang is used as the compiler (search for PLATFORM_COMPILER_CLANG if you want to see impacted platforms).
interesting.. I must have disabled those things when I fixed so the editor could compile with clang (TMAPI) and then forgot about it.. will take a look if they can be enabled
I tried modifying the conditional and unfortunately got linker errors, presumably due to the subtle differences in ABI naming between MSVC and Clang. Since it’s not possible to modify the TMAPI libraries, the thought I had to workaround this would be to have a Unreal module that has some like “bMustBuildWithMsvcOnWindows” flag that just wraps all of the TMAPI calls in a C API, and then have the editor link against that from Clang to avoid ABI naming/linking issues. That would let us use Clang for the majority of the engine and just keep MSVC isolated to some small areas (and because this wouldn’t be much code, MSVC is less likely to hit memory issues on UBA in that scenario).
Thanks for this. I think this case can now be marked as solved - with those fixes submitted we can use Clang with UBA to avoid the memory allocation issues.