FD3D12MemoryPool::Init() huge virtual memory allocations

Hello,

I’m currently investigating the high Commit Size allocated by our game (around 30 GB), which often causes OOM crashes, even on systems with 32GB of RAM but smaller paging file. I realized that Memory Insights misses around 6 GB of the Heap Allocs, while filtering out all Video Memory allocations.

So I followed the recent article about ETW and Windows Performance Analyzer https://dev.epicgames.com/community/learning/tutorials/b6d3/unreal-engine-fortnite-find-every-byte-part-1-demystifying-untracked-memory

In the WPA trace, I can see all 30+ GB commit sizes in the VirtualAlloc Commit Life Times view. I cross-checked the allocations and found that (almost) all of those I couldn’t see in Memory Insights are coming from FD3D12MemoryPool::Init(). It appears that a portion of the memory, which is intended to be VRAM-only, actually contributes to the large Commit Size without noticeably affecting the Working Set. Running the game with null RHI keeps the Commit Size and Working Set memory much closer to each other, as the Commit Size is significantly lower.

I attached callstack screenshots (attachments #1, 2, 3, 4) of some offending allocations. Not all of them, but enough to confirm that this affects not only texture streaming but also RHI Buffers, including both the engine ones and those from custom SceneProxies.

Off-topic: Since it doesn’t use the VirtualAlloc() WinAPI function, which has a hook set up in UE, so it has no chance to be tracked as a RAM allocation by Memory Insights, where I can only see a similar Heap Allocation to the Video Memory (screenshot #5)

After searching and studying the topic online, I found that this is not an expected behavior, and Commit Size is never intended to be backed by VRAM.

It’s also clear that this is not VRAM spilling into system memory, since the machine I tested on has 24GB of VRAM, and the game barely consumed 10GB during the run.

I checked the flags to ensure that the code path in FD3D12MemoryPool::Init() for PlacedResource doesn’t have unexpected UPLOAD or READBACK memory (screenshot #6), and that added condition was never triggered.

These are all of the potential culprits I can think of at the moment.

Please let me know if it’s possible to free up the committed memory, or if you have any suggestions on what else to check, or if you need any additional information.

Thank you,

Denis

Hello,

Thank you for reaching out.

I’ve been assigned this issue, and we will be looking into this memory usage for you.

Hello Stephen,

We would like to know if there are any updates regarding this issue, since this is our top priority at the moment.

Thank you,

Denis

Hello,

We were unable to duplicate your results using Windows Performance Recorder and Windows Performance Analyzer - do you have more detailed reproductions steps for this?

We compared Visual Studio Heap Snapshots at the start and end of the function, and saw only untrackable increases. Some of these were driver overhead, while others were for the CPU-side ID3D* objects.

Looking at your posted pictures, they seem to be showing the same behavior.

Here are a series of other tickets discussing LLM Untracked that might help you:

[Content removed]

[Content removed]

[Content removed]

[Content removed]

[Content removed]

[Content removed]

[Content removed]

Hello Stephen,

Thank you for your reply and the information provided.

I’ll try to reproduce the issue in Lyra or City Sample. As I said in the original post, I was closely following the blog post https://dev.epicgames.com/community/learning/tutorials/b6d3/unreal-engine-fortnite-find-every-byte-part-1-demystifying-untracked-memory

In our game, we observe the behavior where D3D12 and the graphics driver allocate an additional 6GB+ of Virtual Memory, which contributes to the Commit Size using NtAllocateVirtualMemory() with MEM_COMMIT.

Since this function is at the lowest level and is not tracked even by the UE Memory Trace, I don’t see how LLM Untracked could be related to the issue at hand. Can you please expand on this?

To my knowledge, Visual Studio Heap Snapshots are based on the VirtualAlloc() user mode function, similar to how UE hooks are based on FVirtualWinApiHooks::VmAlloc(). But the only tool that uncovered the missing 6GB+ of the process Commit Size is capturing an .etl trace with UIforETW.

Keep in mind that the pictures attached to the original post show size in MBs, so “2 537,500” on the first picture means 2.5 GB. Is it still an expected overhead for D3D12MemoryPool*?* The resources are definitely not CPU-side, as I triple-checked it.

I also tried the GPU Segment Usage WPA view, captured using GPUView tool from Windows Performance Toolkit (following this guide https://gpuopen.com/learn/video\-memory\-profiling\-wpa/). And those 6GB+ are not in the Evicted segment, which peaked at

~700MB and only grew rapidly after a teleport to a different side of the open world, but was consistently approaching 0 otherwise (screenshot attached)

Thank you,

Denis

Hello Stephen,

Unfortunately, I was unable to reproduce this issue in Lyra.

For Lyra, there are only 64MB of committed memory under the texture streaming call stack, which in our game takes 2.5-3.2GB.

Perhaps some CVars or config values can cause such behavior? I specifically tried to decompile NvAftermath and GPUBreadCrumbs from our project, but it did not change the memory footprint. Are there other graphics/RHI/D3D12 debugging features that can be enabled even in Shipping and cause this kind of memory mirroring?

Perhaps D3D12RHI is initialized in a certain way due to a faulty/unoptimal configuration.

Please let me know if you have any suggestions for additional checks. Any clue would be greatly appreciated, as I’m currently unable to push it further due to my limited understanding of D3D12.

Thank you,

Denis

Hello,

Thank you for testing Lyra. We will also try and reproduce it in City Sample. Thank you for your patience as we investigate this.

Since you are experiencing the issue in your main project, can you compare your CVar settings to Lyra? The differences could help narrow down which ones could be causing the issue.

Have you checked if the issue is general, or related to specific cards / drivers?

Hello Stephen,

Thank you for your response. There appear to be issues with the cooked and packaged City Sample using UE 5.5 (CL 40574608) - the PSO compilation fails and causes crashes. Anyway, I doubt the problem will be reproduced there.

We have many (>160) renderer CVars set up differently compared to Lyra, and since capturing and opening the .etl trace takes around 1 hour, it’s not feasible to test them all. I went through them with the naked eye but did not spot anything potentially relevant. Therefore, I was hoping that you might have some insight that would narrow down the possible options.

In the meantime, I’m implementing a series of local changes, including compiling out certain features (GPU/RHI/D3D12 debug) and plugins (FSR, DLSS), and testing such builds. So far, no result.

The issue is present in our game in general across the board, regardless of the GPU manufacturer, series, driver version, Windows 10 or 11.

We actually hit a very similar issue recently caused by Render Targets. Are you creating render targets in your project? We noticed a huge 20-30gb jump due to alpha issue in an RT (system memory). Not sure if this is slightly related but it was an issue we had our end.

Hello Daniel,

this sounds interesting as we create many custom render targets. Could you please expand on it? Were you able to fix it so far?

I didn’t quite get the “alpha issue” part. Do you mean the alpha channel?

I think there is a good chance it is related! Thank you, any additional piece of information will be much appreciated.

Best regards,

Denis

Hello,

We were unable to reproduce the issue in City Sample.

Based on the information you have provided, can you check these items and see if they might be influencing it?

  • Are you running with any validation layers enabled by default?
  • Are you running with GPU crash debugging tools (Aftermath, DRED, etc)?
  • Are you using any tools like PIX that intercept calls to the driver? (“r.D3D12.AutoAttachPIX”)
  • Are you using driver settings that enable features like AMD Virtual Super Resolution?

Can you also answer these questions about your setup?

  1. What is your texture streaming pool size?
  2. Are you using Virtual Textures or Runtime Virtual Textures? If so, what is the rough balance between traditional and Virtual Textures?
  3. Are you using Bindless Resources for textures?
  4. Have you adjusted any of the D3D12 Heap size CVars? If so, does reverting these affect the behavior?

Hello,

I wasn’t able to reproduce the issue in City Sample either. Only specific resources that are intended to be CPU accessible (such as upload, readback, and VT physical space) consume a considerable amount of system memory. D3D12 allocated only a small amount of system memory for textures, which can be considered overhead.

As I said in the previous posts, I tried to disable or compile out as much as I can, so to answer the questions:

  • Validation layers are not enabled by default, checked in FD3D12Adapter::CreateRootDevice()
  • Aftermath, DRED, Breadcrumbs were first disabled and then compiled out to be sure
  • USE_PIX define was set to 0, and I also tried to disable the RenderDoc plugin
  • Driver settings, such as Virtual Super Resolution, are not enabled. The issue is reproducible in our game on AMD and NVIDIA GPUs (did not try on an Intel one yet). Moreover, we can see this happens across the board on production, given the huge gap between Commit Size and Working Set memory usage values

The setup:

  1. The texture streaming pool ranges from 400 to 2000MB, corresponding to Low to Epic, respectively. The issue can be reproduced on any scalability setting
  2. We use VSM and VirtualHeightfieldMesh. Disabling them did not fix the issue. Most of the textures in the project are traditional ones - streamed for the world and non-streamed for UI. Both can be found in the .etl trace in VirtualAlloc Commit Life Times view, with an extreme allocated memory size
  3. rhi.Bindless.Resources and rhi.Bindless.Samplers are set to Disabled. No config changes in our DefaultEngine.ini. FD3D12BindlessDescriptorAllocator::Init() suggests BindlessResources are enabled for RayTracingShaders (pic. 1), however, we don’t use HW raytracing
  4. No, D3D12 Heap size CVars were not adjusted - we use the default values. What CVars exactly are worth adjusting?

We also recently discovered a log:

LogD3D12RHI: OnlineHeap RollOver Detected. Increase the heap size to prevent creation of additional heaps

There appears to be no configuration value or CVar to increase the size of FD3D12LocalOnlineHeap. Should we try increasing NUM_SAMPLER_DESCRIPTORS or D3D12_MAX_SHADER_VISIBLE_SAMPLER_HEAP_SIZE in the code?

The log is rare, while the memory issue happens every run, right from the beginning. Therefore, I’m not sure if there is a correlation.

Thank you,

Denis

Hello,

I would like to share more findings and the results of my tests. Everything below did not solve the issue - The Commit Size allocated by NtAllocateVirtualMemory() from D3D12 for GPU-only resources remained in place. (like textures created from assets)

  • Compiled out all of the custom render targets and USceneCaptureComponents
  • Compiled out custom VirtualTextures and VSMs
  • Compiled out PIX (WITH_PIX_EVENT_RUNTIME=0)
  • Set D3D12_RHI_RAYTRACING to 0 to compile out HW raytracing
  • Explicitly disabled Bindless Resources
  • Compiled out custom vegetation rendering (implemented via custom SceneProxy, no custom D3D12/RHI code)
  • Tried cooking with SM5
  • Disabled Residency Manager
  • Tried the oldest stable game build with a clean 5.3 engine
  • Tried NVIDIA drivers 577.00, 576.80, 576.88, 572.16
  • In case the GPU driver profile causes the issue, tried renaming the executable to FortniteClient-Win64-Shipping.exe and FortniteLauncher.exe
  • Tried using D3D12Core.dll from the latest Fortnite build (from EGS) and the latest from //UE5/Main
  • Removed or compiled out project-specific rendering features related to Rendering or RHI

I also tried switching the project to DX11. It helped a lot, but I guess considering the issue happens due to D3D12 heaps and placed resources, it’s expected, since D3D11 doesn’t have those concepts.

D3D11 has some Commit Size-only overhead, but it is orders of magnitude smaller than D3D12 allocations - only around 400MB overall, excluding allocations for upload, readback, staging resources

If the client is launched with -nullrhi, Commit Size is only about 10% bigger than Working Set, similar to Server builds, so this can confirm that inactive memory belongs to D3D12 (to a certain extent)

Our game client in the main menu with no command line args

[Image Removed]Our game in the main menu with -nullrhi

[Image Removed]

I noticed that other UE5 DX12 games (including Fortnite) have a similar pattern of Working Set to Commit Size proportion as we do, so I wonder if such behavior is expected to some degree, or if it is a global engine/D3D12 bug.

[Image Removed]

I couldn’t run Fortnite with -nullrhi (due to EAC), but other latest UE5 single-player games show a similar pattern, suggesting D3D12 is responsible for the committed but not active memory as well.

Just booted Clair Obscur: Expedition 33 with no command line args

[Image Removed]Clair Obscur: Expedition 33 with -nullrhi

[Image Removed]

Maybe the only reason why we couldn’t reproduce that behavior in Lyra or CitySample is just a smaller amount of D3D12 resources and heaps? But with a bigger game scale, this might be happening in the majority of released titles?

Since overall Commit Size must be backed by either RAM or Paging File, and in our case, committed memory allocated by D3D12 can reach 8GBs, this leads to OOM crashes like this:

Fatal error: [File:E:\Horde\FTW_Rel03_Inc\Sync\Engine\Source\Runtime\Core\Private\GenericPlatform\GenericPlatformMemory.cpp] [Line: 253] Ran out of memory allocating 4198400 (4.0 MiB) bytes with alignment 0. Last error msg: The paging file is too small for this operation to complete..

Do you have a possibility to run Fortnite with UIforETW following the guide:

https://dev.epicgames.com/community/learning/tutorials/b6d3/unreal-engine-fortnite-find-every-byte-part-1-demystifying-untracked-memory

or at least run it with -nullrhi to confirm the Commit Size matches the Working Set much closer than while rendering with DX12?

If so, I would also ask you to escalate that issue as a potential engine bug.

Perhaps updating D3D12Core.dll would help, since 1.614.0.0 distributed with the engine is slightly outdated. My understanding of the topic was not enough to make it work for me on time.

Any additional information, help, or guidance will be much appreciated.

Best,

Denis

Hello,

Thank you for the reply and information.

We are handing this to another team for further investigation and consideration.

Hi,

this is a strange problem indeed - the allocations are coming from the driver when checking the above callstacks and somehow it looks like the driver is keeping CPU backed memory for these while it shouldn’t. I guess you tried to reproduce this already with different drivers and different GPUs?

We have not seen this problem before - I know the driver can back VRAM memory due to residency tracking and eviction but if you have enough VRAM available then it shouldn’t start doing this. It might be worth trying to run without residency to see if that helps in any way (ENABLE_RESIDENCY_MANAGEMENT to 0).

Kind regards,

Kenzo

Hello Kenzo,

Thank you for your reply. I have tried a lot of different things and posted my findings in the reply on 31.07 in the thread with Mr. Kelly.

There, among other things, I mentioned that I had already tried disabling the Residency Manager — but unfortunately, it did not resolve the issue.

We reproduced the issue on various AMD and NVIDIA GPUs and various NVIDIA drivers 577.00, 576.80, 576.88, 572.16. Not sure what AMD driver was installed on the test PC at that time.

I had also asked whether other DX12 UE5 games, including Fortnite, where the proportion between Committed Memory and Working Set appears similar, and in some cases even worse. Can you please profile Fortnite with UIforETW and check how much Committed Memory under D3D12Core.dll is allocated, which is not justified by the flags (not readback or upload)?

Best regards,

Denis

Hi,

I will try and do a test with UIforETW early this week and get back to you. Have you tried this with other non UE d3d12 games as well to see if it’s somehow related to how we allocated resource or if the allocation is completely done by the driver outside our control.

Perhaps trying to disable all pool allocation and going over committed resources for all allocation is something else which can be tested (force bPoolResource to false in FD3D12PoolAllocator::AllocateResource) and also make sure that tracking all allocations is disabled so a the dummy heap is not created.

Thank you,

Kenzo

Hi Kenzo,

Thank you for the effort! I tried disabling all pool allocation as you suggested by forcing it to false in FD3D12PoolAllocator::AllocateResource. Also set TRACK_RESOURCE_ALLOCATIONS to 0.

It did not change the game’s Commit Size, but increased VRAM usage (reported by external tools) by 30%. The allocation callstack confirms that FD3D12MemoryPool is no longer used. Instead, FD3D12Adapter::CreateCommittedResource is used.

[Image Removed]

I have tested a couple of non-UE D3D12 games, and they seem to follow a similar pattern. The higher the graphics settings, the bigger Commit Size the game has.

Cyberpunk 2077 Ultra, RT Overdrive

[Image Removed]Cyberpunk 2077 Low

[Image Removed]

A Plague Tale: Requiem Ultra

[Image Removed]A Plague Tale: Requiem Low

[Image Removed]

Seemingly, this is an expected behaviour, but I can’t help but notice that this is still weird. I couldn’t find any D3D12 documentation that would suggest that most (or all) of the VRAM allocations will use the same amount of committed memory. And it also implies that players must have at least as big Paging File as VRAM taken by the game.

From our CC department’s troubleshooting experience, sometimes users choose to set the paging file to 1-4GB or even disable it entirely, which leads to OOM crashes even on 32GB systems.

It’s also weird that UE5 sample projects don’t have such pronounced allocations - almost nothing is seen under Texture2DStreamIn, even when I put 64 4K textures in one place and disabled texture streaming in CitySample.

One thing I found in common for those games is that they use bundled D3D12Core.dll 1.614.0.0 or 1.615.0.0, while the Windows built-in (in %WINDIR%\SysWOW64) has a different version format (one on the right), so perhaps it is newer?

[Image Removed]

Any information or documentation links will be greatly appreciated, so I can justify the D3D12 committed memory. And confirmation about Fortnite will still be very valuable, since we can’t know for a fact about any of the other games without having the .pdb. Thank you!

Kind regards,

Denis

Hi,

I would try reaching out to Microsoft as well about this or nVidia. They might know more about it.

The version number is the AgilitySDK used it looks like.

Most of the (new) UE5 samples use virtual texturing and we only use regular streaming textures for specific use cases only. FN on the other hand still uses a lot of texture streaming. Don’t know if that could explain something.

Kind regards,

Kenzo

Hi Kenzo,

We’ve contacted NVIDIA and Microsoft and are waiting for a response.

Regarding UE5 samples, I also tried to add meshes using materials with regular textures quite densely in the world, and still did not see such drastic Commit Size usage.

Did you have a chance to profile Fortnite so far?

Thanks,

Denis