How to animate speech?

Hi. May i know how u do this? Do u provide the service?

What exactly are you interested in? Speech, animation, tracking or dialogue?

Speech recognition is open-source locally working toolkit Vosk, speech synthesis is cloud service (I used Yandex Speech Kit, but Google should work better for English).

Facial animation has three layers.

  • a few pre-defined states: friendly, annoyed, thinking (when awaiting for speech syntesis)
  • lip-sync (I use my plugin, itā€™s on the marketplace now)
  • facial animation while speaking. This is poorly trained neural net. I used iPhone live link to capture facial animation and PyTorch to train it.

For dialugue I used another my old plugin, but itā€™s possible to afford on UE4 without third-party solutions. Plus, it makes sense to connect Dialogflow instead.

Tracking & AR. iPhone with NDI HX app is a camera. Is isnā€™t good solution. You can see a mistiming between video and tracking. For tracking I use SteamVR (Vive Trackers) attached to both chair and iPhone.

No, I donā€™t provide any service. I just had an interesting idea and I did it.

Good luck!

I wud like to integrate dialogflo n lipsync. Your lipsyn plugin cannot run runtime. Any suggestions or solytions for this?

My lipsync works in runtime (and you can see it in the video), it doesnā€™t run in real-time. I.e. it canā€™t animate lips from microphone input. It takes some time to recognize a word.

1 Like

Hi Thanks for ur reply. Can we keep on feeding new wave files? Will metahuman keep speaking new waves files when we replace the files at runtime? ( published game )

Yes, and you actually can download executable demo from the marketplace page and test it. It can play wave files from your PC.

Just small update on my efforts: UE4 MetaHuman: Automatic Lip-sync + Facial Animation - YouTube

Unfortunately, itā€™s not in a state I can share. But I think about sharing a tools I developed to work with metahuman facial animation.

2 Likes

Hi! Does it work with Spanish from Mexico?

No, unfortunately.

And just in case: last video is captured in my personal project. Lip-sync and facial animation like this isnā€™t part of my plugin on marketplace.

2 Likes

Iā€™ve used AWS Polly for a simple text to voice iteration but it has some very enhanced and useful features. I hope this helps. FYI, some of you folks with your knowledge and projects are fantastic.

Do you have a tutotrial or something that could help me? Thanks in advance.

Well, thank you. :slight_smile:

I used the awscore-polly plugin from marketplace.
Polly supports 3 spanish voices, and one mexican female voice:
https://docs.aws.amazon.com/polly/latest/dg/voicelist.html

The speach marks allow you to activate the required viseme pose in time (see my image above). If you need more details, tell me which and i can post more screen shots of the solution here.

1 Like

This was my take on animating Metahumans with iClone.

3 Likes

if you have anything working to connect metahumans with dialogflow, i would like to connect. we would like to use that to work with/speak with patients. contact me at csilva@sphinxmedtech.com

1 Like

Hi everyone. First post, so the noob can stay implied. I have a quick question: while it may not supply a complete solution, and certainly doesnā€™t provide an end to end process for speech input ā†’ reaction by agent ā†’ speech output (with facial animation and lipsync), why hasnā€™t anybody here mentioned the MetaHuman SDK from the Unreal Marketplace?

Itā€™s free, and certainly seems to provide some of the same services as the Nvidia Omniverse Audio2Faceā€¦ unless Iā€™m missing something? Anyway, downloading it now. Happy for any correction if Iā€™m talking BS.

re: noob: thx. :sunglasses:

re: end to end: we are also willing to pay for assistance.

re: Nvidia Omniverse Audio2Face: checkign it out.

re: MetaHuman SDK from the Unreal Marketplace: having a look.

the short answer is that the metahuman concept kicks a** so well, we really want to try deploying it with test patients. :sunglasses:

Hi! Yes, actually Iā€™m working with AWSCore-Polly plugin too, but I have found hard to do the lip-sync.

Since Iā€™m technically new to UE, I was wondering if youā€™d mind helping me. Do I have to define the visemes in a blueprint apart and call it from a Level Sequence to animate or the animation can be achieved in real time using the audio and visemes?

Please let me know if I can write you via e-mail.

Thanks in advance.

Regards. :slight_smile:

Hi! You need a webhook in order to achieve the communication between Dialogflow and UE/Metahumans.

got it! where can we locate the webhook API info to run metahumans as stand alone via a browser?

1 Like