MetaHuman skin cache retains LOD0 entries for every NPC regardless of camera distance

Hi,

We’re seeing unexpected skin cache behavior on our 50-MetaHuman test scene and would like to understand whether this is working as intended.

Setup

- 50 MetaHuman Blueprints spawned in a test level (bald/no hair, “Visible in Raytracing” disabled on the skeletal mesh components, development build)

- `rhi.DumpResourceMemory` used to capture resource state at different camera distances (GameMode “TestingGM” automatically runs the command)

Observation

Regardless of camera position, total VRAM usage stays the same, as every NPC retains an LOD0 `SkinCachePositions` entry. We confirmed this by dumping in two scenarios:

1. Camera at moderate distance, LOD visualizer shows every face rendering at LOD3 or lower

[Image Removed]

2. Camera moved far away, LOD visualizer shows every face rendering at LOD7

[Image Removed]

In both dumps, total VRAM consumption is nearly the same. The body meshes show 100 entries at LOD0 (50 NPCs × 2 buffers, matching the expected double-buffering for current/previous frame), plus additional entries at lower LODs. For the far-camera dump:

  • Body SkinCachePositions: 100 × LOD0, 80 × LOD2, 100 × LOD3 (280 entries total)
  • Face SkinCachePositions: 100 × LOD0, 100 × LOD7 (200 entries total)

Total skin cache for the MetaHuman skeletal meshes is ~137 MB in the far-camera dump vs ~142 MB at moderate distance — essentially unchanged despite the rendered LOD dropping significantly. The LOD0 entries alone account for ~128 MB of that total.

When we force LOD3 in BP, the LOD0 entries disappear entirely and total skin cache drops to ~31 MB — so the skin cache does respond to forced LOD, but not to camera distance.

Question

Why does the skin cache hold LOD0 data for NPCs that are rendering at the lowest LOD?

Technically VRAM usage would be less if we forced all metahumans to use LOD1, instead of using LODs as intended, which is quite counterintuitive.

The problem in our main project is no matter how far an NPC is away, their highest LODs stay in the VRAM. We only use LOD0 when very close, but VRAM quickly overflows on our target hardware (RTX 3070 with 8GB VRAM) if a few NPCs are around.

We would appreciate any advice.

I attached both the RHI dumps.

Best regards,

Matthias

[Attachment Removed]

Steps to Reproduce
Repro: https://drive.google.com/file/d/14LalF\-R\_kgzoxONNoMN23aIyu750mvAl/view?usp\=sharing

[Attachment Removed]

Hi, sorry for the delay in getting back to you on this. Your understanding is correct; the skin cache does evict old mesh data when the LOD for that mesh changes.

I think what you’re seeing in the RHI dump is either old data which has yet to be flushed from VRAM or data from other scenes that are running in the editor. Each scene, for PIE viewport, the editor viewport, mesh editors, etc, have their own skin cache and when you run rhi.DumpResourceMemory you’ll get the data for all the skin caches that are running in the background.

We have some better debugging tools to view the memory usage from skin cache. The most useful of these is likely the GPU Skin Cache view mode. For some reason, this no longer appears in the default editor viewport toolbar after a recent refactor. But if open a new viewport (Window > Viewports) it should now appear under the eye icon on the right hand side of the toolbar.

[Image Removed]That should give you more accurate figures for the current scene in the debug text. You can also run ‘r.SkinCache.PrintMemorySummary’ to print out the entries that are current in the skin cache, but again, I think this may include data from all the current skin caches.

To get the most accurate results, I’d recommend running these commands in a packaged build. That way you know there is only one scene and nothing else is interfering with the results. When I do this and run rhi.DumpResourceMemory, I don’t see the old tangent data, etc, for the old LODs. And when I run r.SkinCache.PrintMemorySummary I see the expected data for the expected LODs, which changes as the LODs being rendered change.

Having said all of that, I have some more general recommendations based on what you said about performance. I would say the example setup you have here is pushing what’s realistic in terms of skeletal meshes rendered on-screen. Skeletal mesh wasn’t really designed to run at this kind of scale (ie. 100+ mesh components on screen). This is a general issue with animation/skeletal mesh in the engine, not specific to MH, though the complexity of the MH setup can make the situation worse. It’s one of the reasons we’re investing a lot of time in the new animation system (UAF), which will help avoid many of these problems, but that system is years away from being production-ready.

A better approach for a crowd system - which is what I assume you want to achieve - is to do something more in line with how we implemented crowds in City Sample. In that setup, only ~8 characters are running full actor setups with simplified MHs (ie. body + head only). The other characters in the scene are all Mass entities running static mesh being driven by vertex animation. We then transition between full actor and static mesh entities as characters get closer/further from the screen. This is a much more performant solution to get large-scale numbers of characters on screen.

The problem with the vertex animation approach is that it’s quite a headache to manage generating of the animated textures for your different character setups. So, along with 5.8, we’re planning to release a new sample which shows an MH crowd setup that leverages instanced skeletal mesh rather than static mesh. This means no need to bake out animated textures, you can just re-use anim sequences directly. I think that could be quite useful as a reference for what you want to achieve, although some of the features will still be experimental in 5.8.

If you want more information on these kinds of performance issues, you can take a look at this article I wrote a while back. It includes information on the City Sample crowd setup.

Let me know if you want to discuss any of these points further.

[Attachment Removed]

Hi Matthias,

Yeah, you’re right that Mutable will complicate the situation. The VAT approach requires that a bone texture is baked out for each skeleton, and that needs to be done offline. So the only option for using Mutable with VAT, at least without a lot of customization, would be to use Mutable to generate the merged meshes within the editor, and then generate the VAT textures from there.

For the new approach with ISKM, I’ve been digging through the code since the Mutable use case isn’t one that’s been looked at specifically yet, as far as I’m aware. It looks like the anim sequence transform provider, which generates GPU-formatted animation data, does support the compatible skeleton system used by Mutable. And the transform providers can be generated dynamically, which suggests you could fill them with data from Mutable. But the sample we’re planning to ship has its own MH-specific character composition system, separate from Mutable, and using MH-specific asset formats (like Metahuman Character assets). So you’d need to roll some sort of custom implementation that used just the ISKM + Mass functionality and not the MH character composition system. I’ll try to get some more information from the dev team on this approach and follow up.

Thanks,

Euan

[Attachment Removed]

Hi Matthias,

Hope the call earlier was useful. In terms of the VRAM issues, you can download PIX here. Just run a development build pass -attachpix. Then you can generate a capture and look at the data. The alternative is the skin cache specific debug cvars that I mentioned before. But PIX will give you the ground truth about what’s actually happening on the GPU. If you want to share the capture, there are a few extra hoops to jump through that are documented under the Graphics Issues heading on this [Content removed]

Thanks,

Euan

[Attachment Removed]

Hi Euan,

Thank you for the detailed response and the debugging suggestions — we’ll try them out soon in a packaged build and report back.

On the crowd recommendation: scalable crowds are exactly what we’re after, but I want to flag a constraint before we commit to that direction. In our actual project, we use Mutable to runtime-generate our NPC bodies with randomized clothing and proportions, so no two NPCs share the same skeletal mesh asset at runtime.

How does this interact with the approaches you mentioned? My understanding is VAT requires baking against a known vertex layout, which isn’t possible with runtime-generated geometry. Is the upcoming 5.8 instanced skeletal mesh sample any more flexible on per-instance mesh variation, or does it still require identical source meshes?

Thanks again,

Matthias

[Attachment Removed]