AI for character animation in Unreal Engine

d3e · September 21, 2020, 7:20am

Hi guys! I believe some of you could be interested:

I’m developing a technology for facial animating characters and digital humans: , lip-syncs to any voicever (current text is text-to-speech, but any soundwave would go). Everything is animated by code in realtime without any mocap or manual animation premade. Character is based on DAZ Genesis 8. Imported to UE4, hair and skin shaders were remade from scratch.

How it works:
General:
Character has different emotional states, which affect her facial expressions and also speed and smoothness of her movements. She can find different points-of-interest (like camera or other player) to focus her eyes on.

Speech:
Sample of english text is provided. The same text is converted to audio via 3rd-party text-to-speech engine.
Then I analyze initial text to transcribe it into phonetic transcriptions, it’s made via vocabulary + simple neural network for non-vocabular words.
In the meantime I analyze the audio for extremums to split it into phonemas as well, the algorith tries to find parameters so that the number of phonemas fits the phonemas found via text analysis. It is needed to find timings at which each sound starts and ends.
Then, while the audio is playing one of the corresponding phonemas animation is used. The total number of english phonemas used is 50, engine uses combinations of 5 morph targets (+bones in tongue and mouth) to visualize each of them with some smooth transitions and additional dependencies on sound amplitude and character parameters.

There is also a simpler version which can work in runtime with audio stream.

For the rest the engine remains the same - it uses character model with several morph targets and does all the rest - animates head and face to make it look natural without any need for mocap/facecap or any manually created keyframes. Character has different emotional states, can find different points-of-interest (like camera or other player) to focus her eyes on, lip-syncs to any voicever (current text is text-to-speech, but any soundwave would go). Everything is animated by code in realtime without any mocap or manual animation premade.

As I received several offers for integrating this tech into other projects I’ve made a site for the project and now preparing some more detailed posts for blog: http://elize.ai
I’m also thinking on applying for Epic Mega Grants with this to speed up the development process, polish the technology and use the same approach to animate body and movement as this is even a worse headache in gamedev if you want it to look natural.

The scene in the video is made for a spin-off of this project - a story-based game featuring this character which I use mostly as a sandbox to polish the technology. I plan to release it next spring for VR and may be for iOS/Android later.

Would be happy to hear any feedback, comments. Also if you are interested to use this approach in your project - feel free to ping me!