Difference in memory allocators

Hello,

Our unreal project uses the ANSI C allocator instead of the default unreal one for certain design reasons. On forums and in talks it has been mentioned by UE Staff that using the default Unreal allocator instead of setting ‘FORCE_ANSI_ALLOCATOR=1’ will give a lot of performance optimization and speed up. Forums mention the use of memory pooling, prevention of heap fragmentation, and a few other items that are performed when FMemory is being used as intended.

My question is, is there a list available of the specific items that are being done by the UE allocator that is not done by ANSI C, or can you provide some details in what the benefits are?

Also, is there a good example project on the UE Store that we can build with and without the allocator that will show the performance difference?

Thank you in advance!

[Attachment Removed]

Steps to Reproduce
Toggling of ‘FORCE_ANSI_ALLOCATOR=1’ on and off

[Attachment Removed]

[mention removed]​ I remember you mentioning something about allocators at IITSEC a few years back. Do you by any chance have any insights into this?

Also [mention removed]​ You helped us out on a crash a few years back that dealt with allocators. If you have any insights into this, that would be great!

[Attachment Removed]

Hello!

There is no comprehensive documentation on the internal design of the engine’s allocators. I can share high level details to explain the differences. The short story is that the Ansi allocator is relying on the OS for each memory transaction (allocation\reallocation\free) while the others allocators are using different techniques to save on the system’s overhead.

We need to differentiate the editor from the runtime as their execution context and needs are different. The Editor runs on desktop system that are often more powerful than the target hardware. It also manages source data that is in a format that is easier to manage. We currently use an open source allocator called MiMalloc. This allocator is fairly well documented on its official webpage: mi-malloc: mi-malloc

The runtime allocators are usually the Binned Malloc and the default selection (2 or 3) depends on the platform. Both allocators create pools that manage OS allocated pages. For small allocs (page size /2), the pages are subdivided into bins of constant size so similar size allocations are packed together. The allocator manages the state of the bins (allocated\free) so the behavior at the caller level is the same as classic allocations methods. When allocating, the allocator will look for a page configured with a size that fit the requirement from the call while reducing the waste. This operation is much faster than allocating from the OS as most system will incur extra cost related to virtual addressing. When the allocator cannot find a suitable bin, it simply allocates a new pages and configures it to the proper size. Those sizes are predefined in the headers of the different allocators. Freeing is greatly simplified as it only involves updating the state of the occupied bin. The empty pages are eventually returned to the system to avoid fragmenting the addressing space and running out of RAM.

Large allocations are treated similarly to normal allocations and can just end up being passed to the OS. There can be an extra allocation layer under the binned allocators to manage the pages as some system support up to 1MB. This is defined in CachedOSPageAllocator.h\cpp and add a layer of optimization. The ultimate goal is to reduce the pressure on the OS by reducing the amount of memory related operations since the pooling\caching allows to reuse memory without having to return it to the OS.

The biggest problem that often forces our licensees to use the ANSI allocator is usage of external libraries that rely on DLLs. The engine’s code structure make it that all the standard allocation tokens (New\Delete, Malloc\Realloc\Free) are overloaded and redirected to the engine’s allocator. This creates an incompatibility between the memory allocated by the engine and the memory allocated by the DLL since the memory is not managed the same way. Some libraries allows to inject custom memory management functions and won’t require the usage of the Ansi allocator to be compatible with Unreal.

Regards,

Martin

[Attachment Removed]

[mention removed]​ Is there a way to have the 3rd party DLLs still use the ANSI allocator, and for UE to still use its native?

Some kind of translation or obfuscation layer?

Also, do you know of a project that would show the performance difference between the two?

[Attachment Removed]

It depends on the DLL features and how its types are exposed (exported). Anything that “crosses” the Engine\DLL boundaries must not trigger memory related operations (realloc\free) while “on the other” side.

Container like objects are usually problematic.

void UnrealSideCode()
{
//The internal members of the vector are allocated using the Unreal allocator
std::vector<int> v = {8, 4, 5, 9};
 
//AddVectorItems doesn't modify the vector so this is fine
int Sum = DLL::SumVectorItems(v);
 
//High chances of crashing. If the vector capacity is too small, this will trigger a reallocation which uses the DLL code that won't use the Unreal allocators. 
DLL::AddItemToVector(v, 1);
 
//This will eventually crash as the vector internal memory is not managed by Unreal's allocator. It's hard to say "when" as optimizations and the move semantic have an impact on the ownership of the memory. 
std::vector<int> ReturnValue = DLL::GenerateIntVector();
 
{

If you can guarantee the ownership of the memory to be exclusive to be on “one side”, it will normally work. SDKs that uses Handle and other techniques to hide their implementation details should be fine.

A possible workaround to the problem is when the library offers a static library or its source code it available. In those cases, you can integrate the library to the Unreal compilation process through a module and the allocator overload will apply to its allocations. You can have a look at Engine\Source\ThirdParty to find examples (libjpeg-turbo, FreeImage, Harfbuzz…)

We don’t have metrics that compare the performance of projects that use the ANSI allocator. I guess that any of our official demos or sample project could be used as a test. The allocator can be selected through a launch argument in non-shipping builds (-ansimalloc).

As a test, I ran a local project that uses the WorldPartition template level sparsely populated. The package is using the Development target.

  • MallocBinned2 (default): 90+ FPS
  • MallocBinned3 (-binnedmalloc3) : 90+ FPS
  • AnsiMalloc (-ansimalloc): ~70 FPS

As you can see, there is a significant drop in performance with the Ansi allocator.

Regards,

Martin

[Attachment Removed]

[mention removed]​ Ok, perfect!

Thank you for your help!

[Attachment Removed]