How to animate speech?

Hi. May i know how u do this? Do u provide the service?

What exactly are you interested in? Speech, animation, tracking or dialogue?

Speech recognition is open-source locally working toolkit Vosk, speech synthesis is cloud service (I used Yandex Speech Kit, but Google should work better for English).

Facial animation has three layers.

  • a few pre-defined states: friendly, annoyed, thinking (when awaiting for speech syntesis)
  • lip-sync (I use my plugin, it’s on the marketplace now)
  • facial animation while speaking. This is poorly trained neural net. I used iPhone live link to capture facial animation and PyTorch to train it.

For dialugue I used another my old plugin, but it’s possible to afford on UE4 without third-party solutions. Plus, it makes sense to connect Dialogflow instead.

Tracking & AR. iPhone with NDI HX app is a camera. Is isn’t good solution. You can see a mistiming between video and tracking. For tracking I use SteamVR (Vive Trackers) attached to both chair and iPhone.

No, I don’t provide any service. I just had an interesting idea and I did it.

Good luck!

I wud like to integrate dialogflo n lipsync. Your lipsyn plugin cannot run runtime. Any suggestions or solytions for this?

My lipsync works in runtime (and you can see it in the video), it doesn’t run in real-time. I.e. it can’t animate lips from microphone input. It takes some time to recognize a word.

1 Like

Hi Thanks for ur reply. Can we keep on feeding new wave files? Will metahuman keep speaking new waves files when we replace the files at runtime? ( published game )

Yes, and you actually can download executable demo from the marketplace page and test it. It can play wave files from your PC.

Just small update on my efforts: UE4 MetaHuman: Automatic Lip-sync + Facial Animation - YouTube

Unfortunately, it’s not in a state I can share. But I think about sharing a tools I developed to work with metahuman facial animation.

2 Likes

Hi! Does it work with Spanish from Mexico?

No, unfortunately.

And just in case: last video is captured in my personal project. Lip-sync and facial animation like this isn’t part of my plugin on marketplace.

2 Likes

I’ve used AWS Polly for a simple text to voice iteration but it has some very enhanced and useful features. I hope this helps. FYI, some of you folks with your knowledge and projects are fantastic.

Do you have a tutotrial or something that could help me? Thanks in advance.

Well, thank you. :slight_smile:

I used the awscore-polly plugin from marketplace.
Polly supports 3 spanish voices, and one mexican female voice:
https://docs.aws.amazon.com/polly/latest/dg/voicelist.html

The speach marks allow you to activate the required viseme pose in time (see my image above). If you need more details, tell me which and i can post more screen shots of the solution here.

1 Like

This was my take on animating Metahumans with iClone.

3 Likes

if you have anything working to connect metahumans with dialogflow, i would like to connect. we would like to use that to work with/speak with patients. contact me at csilva@sphinxmedtech.com

1 Like

Hi everyone. First post, so the noob can stay implied. I have a quick question: while it may not supply a complete solution, and certainly doesn’t provide an end to end process for speech input → reaction by agent → speech output (with facial animation and lipsync), why hasn’t anybody here mentioned the MetaHuman SDK from the Unreal Marketplace?

It’s free, and certainly seems to provide some of the same services as the Nvidia Omniverse Audio2Face… unless I’m missing something? Anyway, downloading it now. Happy for any correction if I’m talking BS.

re: noob: thx. :sunglasses:

re: end to end: we are also willing to pay for assistance.

re: Nvidia Omniverse Audio2Face: checkign it out.

re: MetaHuman SDK from the Unreal Marketplace: having a look.

the short answer is that the metahuman concept kicks a** so well, we really want to try deploying it with test patients. :sunglasses:

Hi! Yes, actually I’m working with AWSCore-Polly plugin too, but I have found hard to do the lip-sync.

Since I’m technically new to UE, I was wondering if you’d mind helping me. Do I have to define the visemes in a blueprint apart and call it from a Level Sequence to animate or the animation can be achieved in real time using the audio and visemes?

Please let me know if I can write you via e-mail.

Thanks in advance.

Regards. :slight_smile:

Hi! You need a webhook in order to achieve the communication between Dialogflow and UE/Metahumans.

got it! where can we locate the webhook API info to run metahumans as stand alone via a browser?

1 Like