MetaHuman facial Control Rig works on standalone heads, but not on Mutable-generated merged head/body skeletal meshes

Hello,

We use Mutable to generate bots, and we can have many bots visible on the map at the same time. Because of that, we decided to merge the head and body into a single final mesh for performance reasons.

Originally, all of our head assets supported the MetaHuman facial Control Rig, and all facial animations were authored through that rig. This was important for us because it allowed us to reuse the same facial animation setup across different heads, even when the heads looked different.

After the head became part of the body and was merged into the body skeleton, the MetaHuman facial Control Rig for the head stopped working with these generated merged meshes.

At the same time, simply converting the Control Rig animation into a regular animation sequence does not seem like a good replacement for our case, because a regular baked animation does not appear to be equally correct for all heads. Our heads are visually different, and the Control Rig workflow was what previously allowed the same facial animation to work correctly on any head.

Because of that, we would like to understand what the supported and recommended workflow is for this kind of setup.

Our questions are:

1. Is it generally correct/supported to use the MetaHuman facial Control Rig at runtime as the final solution?

2. If we should avoid using the MetaHuman Control Rig in this way, can its functionality be replaced with a retargeting-based workflow? More specifically, can we bake facial/head animations into animation sequences and then use retargeting so that they still work correctly on other different heads?

3. From a performance point of view, was merging the head and body actually the right decision for scenes with a large number of bots visible at once (for example, around 20-30 bots), or is keeping the head and body separate still the more correct/optimal setup?

4. Would it make sense to keep the head and body separate only for a subset of “important” bots that need to speak during gameplay, while keeping the rest as a single merged mesh? Since we use Mutable, we can support different mesh generation variants if that is the recommended approach.

Any guidance on the supported/recommended pipeline for balancing facial animation reuse across different heads and runtime performance for crowds would be very helpful.

Thank you.

[Attachment Removed]

Steps to Reproduce
1. Prepare several different head assets that support the standard MetaHuman facial Control Rig workflow.

2. Author facial animations through the MetaHuman facial Control Rig and reuse them across those different heads.

3. Use Mutable to generate bot variants for a scene with many bots.

4. Change the setup so the final bot uses a merged head+body skeletal mesh with the head merged into the body skeleton.

5. Observe that the MetaHuman facial Control Rig that worked on the standalone head assets no longer works correctly on the generated merged meshes.

[Attachment Removed]

Hi, sorry for the delay following up on your question. The main issue with merging the body and facial meshes using Mutable is that the DNA data isn’t merged for the different meshes. And without the correct DNA data for the facial mesh, you effectively no longer have a true Metahuman setup on your character.

This approach can work if you want to have background characters that have a different setup to the main/foreground characters. We did this in the City Sample demo, where the crowd characters are generated from merged meshes (this was prior to Mutable) that then run vertex animation. And we switch to higher detail characters with multiple mesh parts as those actors get closer to the camera.

In terms of your specific questions:

> 1. Is it generally correct/supported to use the MetaHuman facial Control Rig at runtime as the final solution?

No, the control rig doesn’t need to be the final solution at runtime. And we wouldn’t recommend that approach if you have many characters in the scene as the facial control rig isn’t going to be performant at runtime. Instead, you can bake out an animation sequence for your facial animation and use that as a runtime solution.

> 2. If we should avoid using the MetaHuman Control Rig in this way, can its functionality be replaced with a retargeting-based workflow? More specifically, can we bake facial/head animations into animation sequences and then use retargeting so that they still work correctly on other different heads?

My understanding is that the facial skeletons of each metahuman should be the same/compatible, so you should be able to drive every facial mesh with the same set of anim sequences. The Rig Logic node in the post-process anim blueprint interprets the curve data baked into the anim sequence to drive the relevant bones and morphs via the DNA data. If you find this isn’t working for you, let me know, and I can investigate further.

> 3. From a performance point of view, was merging the head and body actually the right decision for scenes with a large number of bots visible at once (for example, around 20-30 bots), or is keeping the head and body separate still the more correct/optimal setup?

The fewer skeletal meshes that are being ticked each frame, the better performance you’re going to get. So merging the face and body meshes can make sense from that point of view. But then you need to deal with swapping between the lower detail merged meshes to the regular actors with multiple mesh parts, which, as I say, is similar to the approach we took in City Sample. The alternative would be not to merge the heads and just merge the other body mesh parts.

> 4. Would it make sense to keep the head and body separate only for a subset of “important” bots that need to speak during gameplay, while keeping the rest as a single merged mesh? Since we use Mutable, we can support different mesh generation variants if that is the recommended approach.

Yes, this is effectively what I’ve discussed above. But it very much depends on the requirements of your project. Maybe there are only specific bots which the player can interact with so you keep those as distinctly different types of actors vs the ‘background’ bots that can be fully merged. Or you go down the route of swapping between simple merged mesh actors vs more complex ones as they go from foreground to background, etc. But that’s a more complex option.

Happy to discuss this further if you have more questions.

[Attachment Removed]