Hi there,
Looking at your memreport, I can see the following stats regarding your transient memory usage:
859\.375MB \- Texture Memory Requested \- STAT\_RHITransientTextureMemoryRequested \- STATGROUP\_RHITransientMemory \- STATCAT\_Advanced
1755.688MB - Buffer Memory Requested - STAT_RHITransientBufferMemoryRequested - STATGROUP_RHITransientMemory - STATCAT_Advanced
2615.062MB - Memory Requested - STAT_RHITransientMemoryRequested - STATGROUP_RHITransientMemory - STATCAT_Advanced
128\.000MB \- Memory Aliased \- STAT\_RHITransientMemoryAliased \- STATGROUP\_RHITransientMemory \- STATCAT\_Advanced
640\.000MB \- Memory Used \- STAT\_RHITransientMemoryUsed \- STATGROUP\_RHITransientMemory \- STATCAT\_Advanced
These values indicate that the transient heap cache is using 640MB, which is fairly significant. It is likely that the transient heap cache memory requirements have increased from version 5.2 of the engine due to more work being moved to the GPUs asynchronous compute queue. Transient memory required by the async compute queue is harder to alias (reuse space that isn’t needed anymore) with memory required by the graphics queue. This is because the memory might be required by the async compute queue at almost any time (between a graphics fork and join event) during graphics queue execution. Transient memory required by the async compute queue is therefore more likely to need its own unique space, in one of the transient heaps, that doesn’t overlap with any of the space used by resources required by the graphics queue.
The best way to debug your transient heap allocations is through unreal insights. To give a bit of background, the transient allocator uses a cache of pre-allocated GPU (device local) memory heaps. By default, each heap is 128MB. A new heap is only allocated if a particular transient resource cannot be placed in an existing heap. Below you can see an insights trace of a test scene with the RDG trace channel enabled and set to visualize the transient heaps. You can see on the left that this scene uses 3 memory heaps (Memory Ranges 0-2). When resources are placed in a heap, they can alias with other resources in the same heap (occupy the same memory), as long as their lifetimes are guaranteed not to overlap. As noted above, this is harder when async compute is involved. You can see an example of this in the TSR Decimate history pass (buffers labeled in second screenshot, associated render pass in third screenshot), which requires a number of large resources on the async compute queue. The two turquoise passes highlighted in the third screenshot shows where the graphics queue fork and join events occur. This means that the memory required for these TSR resources can’t occupy the same space as anything required by the graphics queue between these two points, which is why they occupy their own unique spaces in the heaps.
[Image Removed][Image Removed]
Regarding your memory pressure and residency, from your memory report it looks like you are running very close to, and probably exceeding, your memory budget. Under the rhi.DumpResourceMemory section of your memreport, I can see that you are using 7317MB of non-transient resource memory. Combined with the transient memory usage of 640MB we got from stat RHITransientMemory, it appears you are using around 7957MB of rhi resource memory. I can also see that quite a few of your largest rhi resources are also getting evicted (made non-resident) due to your high memory pressure (all the entries under rhi.DumpResourceMemory that are missing the Resident flag have been evicted by the residency manager).
As you get closer to the VRAM memory budget, the residency manager will start evicting memory more aggressively (to system memory). Eviction starts at 70% of your memory budget, where resources will be evicted if not used for more than 1 minute. Once memory pressure reaches 100%, evictions happen after 1 second of non-use. Once you go over your memory budget, any resources that are required for the frame, but are not already resident, will also need to evict currently resident resources to make room. This is most likely the source of your unrecoverable performance drop.
I am unsure why turning off r.RDG.TransientAllocator would give you more memory headroom. If anything, I would expect you to end up with less memory headroom due to having to allocate all memory for transient resources up front, and not being able to alias memory that isn’t used anymore within the frame. Looking at an insights trace with the RDG trace channel enabled, and set to visualize transient heaps, should give you some more insights into what might be happening here. If you still have a 5.2 build, it might be useful to compare the RDG insights between the two different versions.
For debugging what other resources are occupying your VRAM you can use the Render Resource Viewer in editor, or run rhi.DumpResourceMemory in the console. The full list of options for rhi.DumpResourceMemory are as follows: rhi.DumpResourceMemory [<Number To Show>] [all] [summary] [Name=<Filter Text>] [Type=<RHI Resource Type>] [Transient=<no, yes, or all> [csv]. Memreport will also run and report rhi.DumpResourceMemory, but without the `all` option, meaning it will only output the top 50 largest rhi resources.
[Image Removed]
Regarding residency vs locality, the concept of residency only really applies to resources that are originally allocated in device local memory (VRAM). For a resident resource to be made non-local, it would have to be evicted and made non-resident. So the concept of a resource being resident, but non-local (in shared memory), doesn’t really make sense to me. The engine can, and does, allocate some resources directly in non-local memory (these are generally upload and feedback buffers). However, residency doesn’t really apply to these, since they are never moved to device local memory (except by copying into other buffers that are already resident). The engine does not move a resource to non-local storage (say to free VRAM) without evicting it and making it non-resident through the residency manager.
Actually, after reading the D3D12 residency library readme (available here) and looking a bit more into the relationship between it, and the OS’s own video memory management system (VidMM), it appears that it IS possible to have resident non-local memory. The definition of residency, according to the D3D12 residency library, is whether or not a resource is accessible by the GPU. So when a resource is evicted by the residency manager, it becomes completely unmapped from GPU accessible address space. According to this documentation, eviction happens to disk (it does not go into shared memory), so the concept can be applied to both local and non-local GPU addressable memory. VidMM also has the final say on what actually gets evicted. When evict is called by the residency manager, the resource is only actually marked for eviction. VidMM will try not to actually evict these resources unless it actually needs to. On the other hand, if you are over your memory budget, the residency manager may not be sufficient to keep your memory within budget. In this case VidMM may start evicting GPU resources opaquely under the hood, without informing the residency manager. This is the case where resident non-local resources could occur. The resource is still technically accessible by the GPU, but will need to stall on accessing these resources, and Unreal has no knowledge of this. However, I think this scenario is very unlikely, as the residency manager should be robust enough to prevent this from happening (resources cannot be made resident until enough resources have been evicted to make space available).
I hope this was clear, and helps you debug your issues further. Let me know if you have any more questions regarding this.
Regards,
Lance