I’d like to see an embedded facial animation solution and implement something such as NVIDIA Face Works as well after that.
Unreal Tech Name Example:
Unreal Engine Facial Animation Pipeline
How the system can work to aid in rapid facial animation tasks:
Start with something simple, you could use a technology such as Microsoft’s old Speech Recognition Engine that converts voice to textual words. (The old Visual Basic Based Engine works better than the newer C++ version and is more complicated and requires more and is less accurate for simple needs, but the newer version is currently supported more.)
This way you could use either an imported sound file or voice recording in realtime and use the Speech Recognition Engine to convert those words to text form.
Those textual words are used to scan an array list of known or user custom defined Phoneme and Morpheme (Depending on language).
The user could have imported or preset Poser Endo Morphs or Facial Morphs applied for each part of text using vowel sounds.
The vowel sounds are thus, converted into the appropriate facial morphs from the words detected in the text word list that has already been converted from the Microsoft Speech Recognition Engine to text form from vocal audio input form. (It’s alot easier than you think it is and this stuff has been around for well over a decade.)
You use the existing animation pipeline to further edit the facial animations adding facial emotions if need be with animation blending which can also be done in blueprints.
For timing how long before transition each of the parts of a word take you’d simply use the existing timeline recorded of the pauses in speech using the natural timing lengths of your speech, you know when your not speaking that the facial speech should return back to a neutral facial expression position unless there is an added emotion face blend you return too.
That allows you to programmatically replay those expressions in realtime.
For the Speech Recognition Engine portion, which is pretty much common in all devices now, Lernout & Hauspie developed this early technology from 1987, and in the 1990’s Microsoft provided it for free which was implemented in ActiveX Control form which could also be embedded into web pages, and was used in correlation with the Microsoft Genie line or Avatars and Microsoft Voice Command and TTS Engine for Text to Speech conversion (The most realistic voice was Mary, which resembled the Star Trek Computers Voice), it was overhauled to C++ only requiring lengthy grammar voice recording sessions in a dictionary based format, and products such as Dragons Naturally Speaking picked up the newer C++ version requiring heavier resources for realtime feedback, though with the newer engine you could add a text file for accuracy, and products such as Mimic aka LipSync used the more simpler direct approach which has also been placed online for website avatars, Lernout & Hauspie had an issue in 2001 and went bankrupt, but the technology has remained freely available. And now there’s personalities such as Apple Siri (A project online via GitHub uses the protocol.), Microsoft Cortana, and Google Now/Google Voice (HTML5 speech input via <input type=“text” x-webkit-speech /> and the Web Speech API) Formal link is: https://developers.google.com/web/updates/2013/01/Voice-Driven-Web-Apps-Introduction-to-the-Web-Speech-API?hl=en
The Demonstration Link is (This gives you the basic idea here):
My opinion is the older stuff was actually better performance wise, had better accuracy and featured a better more stable interface, this includes the older Mimic Lipsync software which was simple and easy to adjust to add emotions using a simple 2D image based and vowel based cartoon character, vs. how it is now.
Blender Sintel Facial Emotion Concept:
Mimic Lipsync Example:
Gravity Studio’s Avatar Example:
Older Gravity Studio’s Example of Same Avatar:
Simplest Concept Version Using Blender:
Blender Shape Key Script Add-on Dialog Markers Example (Note: This is a more manual process.):
Blender a Very Lengthy Manual Process with Shape Keys:
An artist named Liam Kemp CG Artist did some work for 2001 - 2003 Short Film, and then some work later in 2008 using 2004 hardware.
Liam Kemp Site:
A video example, I know the mouth looks goophy, but the graphics with the expressions, wow:
Simple Process Explanation:
1. Voice Input from Microphone or Imported Wave File.
2. Speech Recognition Converts (Interprets Audio Input) to Text Words
3. Text Array used in a loop to detect which Facial Morphs to Apply at recorded text length.
4. Optional: Adjust with vowel sound text and/or by entering manual dialog to correct any speech recognition word detection error. (Or just simply review the output text file and manually edit the same file to compensate for any delays and accuracy.)
5. Facial Emotions added using animation Blending.