High render thread cost of Groom Bindings, even on "Rigid" mode

We are working on optimizing MetaHuman rendering. In scenes with multiple MetaHuman NPCs, draw times skyrocket near characters with groom assets. This is especially problematic on lower-end hardware, so we are trying to reduce that overhead.

While investigating, we found that rigid bindings EGroomBindingType::Rigid carry a surprisingly high overhead compared to having no bindings at all — nearly as expensive as skinned bindings, despite performing no deformation.Our test scene contains 50 MetaHumans sharing a single groom asset with “Use Cards” enabled (no strands). We measured three configurations using the built-in FPS chart command and Unreal Insights, with a full system reboot between each measurement:

Avg. RenderThread frametime

  • Skinned bindings (default)~10.42 ms
  • Rigid bindings~8.50 ms
  • No bindings (groom attached via AttachComponentToComponent)~5.44 ms

Rigid bindings save only ~2 ms over skinned bindings, but still cost ~3 ms more than no bindings at all. Given that rigid bindings do not deform anything, we would expect the cost to be close to the no-bindings case.

Our questions are:

  1. Where is the ~3 ms overhead coming from with rigid bindings? Since rigid bindings do not deform anything, why is the cost not comparable to simply attaching the groom component without a binding?
  2. Is AttachComponentToComponent without any binding asset the recommended approach for cards-only grooms that just need to follow the head? It performs significantly better, but it feels like a workaround rather than an intended workflow.
  3. Are there plans to optimize the rigid binding path, or to introduce a mode for groom components that avoids per-frame binding evaluation when no deformation is needed?

[Attachment Removed]

Steps to Reproduce

The attached repro project is based on an unmodified UE 5.7 with no third-party plugins. It contains three levels:

  • Lvl_Bindings_Skinning
    • 50 MetaHumans with default skinned groom bindings
  • Lvl_Bindings_Rigid
    • 50 MetaHumans with all card LOD binding types set to Rigid
  • Lvl_NoBindings
    • 50 MetaHumans with groom components attached via AttachComponentToComponent (no binding asset)

A custom game mode automatically runs the profiling session: go full screen, press Play, and wait 17 seconds. The game mode triggers the FPS chart capture and an Insights trace automatically.

Note: Scalability settings are automatically being set to “High” and screen percentage is set to 58% to reflect our target game settings.

We have also attached our resulting FPS chart results and Insights trace files for comparison.

Attachments:

Hello!

I ran your project into our mainline (upcoming 5.8). It looks like there was a configuration mistake. In the Rigid binding map, the Hair_L_Straight_Rigid had all its LODs of Group 1 set to skinned. You can visualize this with Lit > Groom > Instances. There is a column describing the nature of the binding.

[Image Removed] [Image Removed]Settings this back to Rigid reduces the overall cost, and makes it in par with the no-binding version.

I hope this helps.

/Charles.

[Attachment Removed]

Hi [mention removed]​,

Thank you for the hint — and especially for pointing us to the Lit > Groom > Instances view. That’s a really useful way to spot mismatched binding configurations.

After fixing the Hair_L_Straight_Rigid LOD group that was still set to Skinned, I also removed eyebrows and eyelashes which had the wrong settings, too, and re-ran the three scenarios (plus a new “no hair at all” baseline as a sanity check). Updated render thread times:

- Skinned bindings (default): ~10.03 ms

- Rigid bindings: ~7.12 ms

- No bindings (AttachComponentToComponent): ~7.24 ms

- No hair at all: ~4.29 ms

Rigid and no-binding are now effectively in line with each other, as you predicted. From these numbers we read the cost breakdown as roughly:

- ~3 ms for the skinned binding evaluation itself (10.03 → 7.12)

- ~3 ms for everything else groom-related — card meshes, materials, draw calls (7.12 → 4.29)

Based on this, we’re considering switching to Rigid bindings below a certain LOD threshold in our game (keeping Skinned only on the closest LOD where the deformation is actually visible). Would you advise this as a reasonable strategy?

Also, given that ~3 ms of the remaining cost is the grooms themselves (cards, not strands), are there other recommendations you’d suggest to further reduce draw time on the card path? We’re particularly interested in anything related to material complexity, draw call batching across grooms sharing the same asset, or flags we might not be aware of for cards-only setups.

Thanks again for the quick and helpful response.

Best,

Matthias

[Attachment Removed]

Hi Matthias,

Thanks for sharing your update. Your strategy seems reasonable. I didn’t check, but also you can also disable simulation to avoid interpolation cost and save memory, if that’s not already the case.

Then the rest of the cost will be on the primary view rendering, and shadow view rendering. Most of the cost would come from overdraw I presume. The shader we export with MHs by default, uses dithering and a depth offset pixel IIRC, which might prevents earlyZ. On some platform, we managed to render this during the pre-pass which makes the base pass less costly (material evaluation happens only when depth match, so no overdraw). You can try to make a very simple material to measure what the speed of light would be, and simplify the material adequately perhaps?

Cheers,

/Charles.

[Attachment Removed]

Ok, thank you for the update!

/Charles.

[Attachment Removed]

Hi [mention removed]​,

Thanks again for your helpful input — much appreciated.

I’ll take a closer look at your suggestions and get back to you with some results. At the moment I’m still dealing with some VRAM-related issues on my end, which I’m trying to sort out first:

[Content removed]

Once that’s resolved, I’ll continue with the material and rendering investigations you mentioned.

Best regards,

Matthias

[Attachment Removed]