VRAM profiling tools

Hello,

We are trying to optimize VRAM usage in our game. Currently, we are using `rhi.dumpresourcememory` to create a full report of all the resources and transient resources used by the game. The report is handy as it allows us to categorize resource usage using the provided owner information. However, if we sum all the rows of the Size column, the result is greater than the actual VRAM usage reported by PIX or task manager, sometimes by more than 20% (excluding non-resident and marked for delete resources). How is this possible?

  • Perhaps transient resources that use the same underlying memory are still reported as “individual” resources?
  • We also noticed some resources are duplicated and reported more than once: as you can see in the image below, the distortion texture is reported three times

[Image Removed]

`rhi.dumpresourcememory` is very useful, but these errors can be misleading. Is there a way to mitigate them?

There are other discrepancies we don’t fully understand:

  • `stat D3D12Memory` does not match what is reported by PIX. This was unexpected as it looks like this stat gets its value from the adapter.
  • `stat D3D12Resources` does not match what is reported by `stat D3D12Memory`. I guess D3D12Resources doesn’t know about some driver-specific allocated memory, but I’m not sure.
  • `stat RHITransientMemory` shows the requested memory, the used memory, and the aliased memory. However, `MemoryUsed != Requested Memory - Aliased memory` is that how it is supposed to be?

In conclusion, what tool should we use to profile VRAM usage? We would like to use `rhi.dumpresourcememory` as it provides data regarding who is using the memory.

Thanks,

Damiano

If I may chime in: If this is pc - AMD’s RMV tool is great, but takes some digging to fully understand what you are seeing. ([somewhat related [Content removed]

The complications of transient & re-use, virtual, reserved, & local vs host are concepts that make accounting harder.

Hi,

I have been doing some digging and found a few UDN threads on this topic with answers from Epic regarding the use of rhi.dumpresourcememory:

  • [Content removed] “The rhi.DumpResourceMemory is designed to show a total of all memory resources, so it should not be used to take a snapshot of how much memory is in use at a given point in time.”
  • [Content removed] (the information is in Japanese, but you can translate this to English by right-clicking the page in Chrome) states: “The information displayed by rhi.dumpresourcememory, such as Nanite.StreamingManager.ClusterPageData, indicates the maximum buffer size. In reality, an appropriate amount of buffer is allocated and used. Due to these behaviors, rhi.dumpresourcememory is not very suitable for measuring the amount of memory used by a process. We recommend using Memory Insights or RDGInsights.”
  • [Content removed]

I asked my colleagues and the consensus was that using vendor specific profiling tools (PIX, AMD’s Radeon Memory Visualizer and Nvidia Nsight) can provide more accurate results, with PIX being the preferred tool for DirectX12 projects.

With respect to your question on stat D3D12Resources not matching stat D3D12Memory, could you clarify which numbers don’t match? I did some experiments and found that some resources like UAV Textures and Render Targets report identical numbers and the UAV buffers are off by a few percent. Please see the screenshot below where both stats are enabled:

[Image Removed]

With respect to your question on `stat RHITransientMemory`, I checked the code that’s responsible for calculating these stats (copied below from Engine\Source\Runtime\RHICore\Private\RHICoreTransientResourceCoreAllocator.cpp). Memory Requested is calculated as Buffer Memory Requested + Texture Memory Requested

`void FRHITransientMemoryStats::Submit(uint64 UsedSize)
{
const int32 CreateResourceCount = Textures.CreateCount + Buffers.CreateCount;
const int64 MemoryUsed = UsedSize;
const int64 MemoryRequested = AliasedSize;
const float ToMB = 1.0f / (1024.0f * 1024.0f);

TRACE_COUNTER_SET(TransientResourceCreateCount, CreateResourceCount);
TRACE_COUNTER_SET(TransientTextureCreateCount, Textures.CreateCount);
TRACE_COUNTER_SET(TransientBufferCreateCount, Buffers.CreateCount);
TRACE_COUNTER_SET(TransientMemoryUsed, MemoryUsed);
TRACE_COUNTER_SET(TransientMemoryRequested, MemoryRequested);

CSV_CUSTOM_STAT_GLOBAL(TransientResourceCreateCount, CreateResourceCount, ECsvCustomStatOp::Set);
CSV_CUSTOM_STAT_GLOBAL(TransientMemoryUsedMB, static_cast(MemoryUsed * ToMB) , ECsvCustomStatOp::Set);
CSV_CUSTOM_STAT_GLOBAL(TransientMemoryAliasedMB, static_cast(MemoryRequested * ToMB), ECsvCustomStatOp::Set);

SET_MEMORY_STAT(STAT_RHITransientMemoryUsed, UsedSize);
SET_MEMORY_STAT(STAT_RHITransientMemoryAliased, AliasedSize);
SET_MEMORY_STAT(STAT_RHITransientMemoryRequested, Textures.AllocatedSize + Buffers.AllocatedSize);
SET_MEMORY_STAT(STAT_RHITransientBufferMemoryRequested, Buffers.AllocatedSize);
SET_MEMORY_STAT(STAT_RHITransientTextureMemoryRequested, Textures.AllocatedSize);

SET_DWORD_STAT(STAT_RHITransientTextures, Textures.AllocationCount);
SET_DWORD_STAT(STAT_RHITransientBuffers, Buffers.AllocationCount);
SET_DWORD_STAT(STAT_RHITransientResources, Textures.AllocationCount + Buffers.AllocationCount);

Reset();
}`Hopefully the above helps. Please let me know if you have any further questions.

Thanks,

Sam

Hi there, I tried RMV and yep it is great. The two main downsides are that it only works on AMD chips and that, for some reason, the name of some resources is not passed to dx12 in Unreal (see SafeCreateTexture2D in D3D12Texture.cpp line 1111, ue 5.4). A simple modification made profiling much more insightful.

Thanks a lot for the suggestion,

Damiano

Hello,

Thanks for the detailed reply! Ok, so I think it is clear that `rhi.dumpresourcememory` can probably only be used to have an overview of what is happening under the hood. I still find it very convenient for non-transient resources (i.e. textures used by materials, etc).

I am using it together with RMV when I want to check the allocations made on the GPU memory.

Regarding the difference between the `stat d3d12resources` and `stat d3d12memory`, here is what I see

[Image Removed]However, I think it is due to the fact that the rhi does not know about PSOs, Backbuffers, etc.

Thanks again for the reply.

Damiano

Hi,

Good to know that the RMV tool has been helpful.

You are right that the “TOTAL” memory stat (used by D3D12 resources) does not fully correspond to the “Used Video Memory” stat (for D3D12 memory), which is coming directly from the adapter DXGI method Adapter3->QueryVideoMemoryInfo(0, DXGI_MEMORY_SEGMENT_GROUP_LOCAL, &LocalMemoryInfo) (you probably have found this out already) and may include memory used by other processes or the amount allocated by the driver.

Thanks,

Sam