How to animate speech?

Just wondering if there is an easy way to animate speech for the metahumans? Ideally I would like to just enter some text and it would create a realistic animation of it (and with a set of different expressions). Possible?


It’s definitely possible but I don’t think the technology already exists, especially about emotional expressions.

I recently made a lip-sync plugin and can share my pose asset with visemes if you like to do something with it (but not the plugin itself).

1 Like

I’d be very interested in that @YuriNK

Here: GitHub - AntiAnti/MetahumVisemeCurves: Pose Asset with visemes for Epic's MetaHuman face skeleton<

But, as I said, this is just a pose asset.


Thank you! I’ll have a look. Have you have had a look at the Omniverse Audio2Face?.. they do a pretty good job interpreting the audio to lipsync

1 Like

Oh, Omniverse’s really good, it’s next-gen for lip sync. My solution works classical way: voice recognition engine → pronouncing dictionary → curves for visemes.

I have been working on taking a single audio file and after a mix of blueprints and a little manual effort, have a Metahuman perform the dialog.

Here is what I have so far.

Take a complete voice actor audio performance and use an audio editor to create regions for each word. A region is nothing more than the start and end time with a region length time.

For my example, I used Reaper and named each region the standard English written word.

The regions are then exported to CSV and imported into UE as data table.

Using blueprints and a separate data table, which contains a dictionary of American English to IPA, the standard English words are broken down into the IPA phonemes.

The phonemes are then sent to a custom version of the common Metahuman face Anim blueprint, which has all the phoneme poses created by using the Modify Curve and the Metahuman CTRL curves, which are already built into the Metahuman Face Anim blueprint.

A timeline along with the region times exported from the audio file as used to control the timing of the animation and sync it with the original audio file.

My IT support day job is starting soon but I could get a short demo video uploaded tonight. If there was enough interest, I could create a tutorial video.

1 Like

NVIDIA Omniverse Speech2Face will basically transfer your speech a face mesh that they supply and then you can transfer it to your metahuman, I haven’t tried it as the Speech2Face app won’t launch, I’ve tried their other apps on the Omniverse like Create and View, but they like most other free programs, Quixel Mixer comes to mind, and obviously Unreal Engine excluded from those, are so painfully slow and temperamental they are not even worth bothering with. Would be interested to see if anyone has any luck with Speech2Face as it looks good for inputting pre recorded audio or live recording and seems to do a pretty good job on the YouTube tutorials.

Not happy with the resultant video, quality and sync is not good but it’s time for bed.

1 Like

Here is a slightly cleaner one after fixing an obvious fault with the “h” shape.

@Stephen_Palmer, your demo looks impressive. Thanks for sharing.

I was playing with Audio2Face recently. Their standard face mesh works well, but I have a problem importing MetaHumans there. The MetaHuman is loaded well in Unreal Engine (v.4.26.2) but then I use Omniverse Plugin to export it to Omniverse Create, for example. I’m missing some body parts there like hair, eyes, eyebrows etc. See below:

Do you know how to export MetaHuman properly?


Have you been able to export Audio2Face cache data and then import it into Unreal? Would you mind sharing the steps? I’m not an animator and my knowledge is somehow limited and I can’t manage to import the data into Unreal 5. Here are the steps that I am doing:

Create cache data from Audio2face. Import it into Maya. Everything plays back correctly in Maya. I have tried two methods, the first one is my Maya scene only contains once facial mesh and I import the cache to it and it works in Maya. The Facial mesh is rigged to one bone so that I can import it into Unreal. When I import it, it creates an animation, but when I play it back nothing is moving, I can only see the frame that was exported. There’s an animation curve also imported. I’ve tried putting values into the curve, but it’s still not playing back the full animation (no movement at all). Second method, I have two meshes in my Maya scene, one is a non rigged mesh where I import the cached data, and the other one is a copy of the same mesh, rigged to a single bone. The rigged mesh has 1 blend shape. I import the cache data, and then both meshes in my Maya scene play back the animation correctly. I export the rigged mesh with one bone, same result in Unreal, no playback of the animation. For both methods when I export (in the FBX options), I make sure to bake the frames and export animations.

If you have managed to export data from Audio2face to Maya and then import it to Unreal, would you mind sharing your workflow? I’m sure that I’m just missing something very simple, but I haven’t been able to figure it out yet.

Thank you!

I’m really interested in your meta human lip-sync plugin, is it up on the store and can you provide a link?

If you have some kind of budget I would recommend you to purchase iClone with AccuLips. It lets you import text, transcode it and sync it automaticlly. They have great Metahuman Kit for sending animations and data to Unreal via LiveLink.


If anyone is looking for a solution to this:

Just use the Alembic format to export from Maya and import into Unreal. The solution was provided graciously by Ricardo.lpc.

1 Like

What is the relationship between this video and the speech animation?

This video shows you how to export data from Maya using the Alembic format. If you use Audio2Face to create animation from audio files and then export the cache to Maya, this is how you can import it into Unreal. It’s not possible (as far as I know) to export the cached Audio2File data from Maya using the FBX format and then import it into Unreal.

1 Like

Here is an example I built using iClone 7 and acculips. This example also uses Speech to Text and for understanding.

1 Like

And my solution


I experimented with AWS Polly`s ability to create speach marks, and implemented a animation blueprint for it. Here is the result:

1 Like