I’m thrilled to introduce NeuroSync, a cutting-edge transformer-based seq2seq neural network designed to create real-time face blendshape animations directly from audio features. This is a brand-new workflow that is being developed specifically for Unreal Engine. With NeuroSync, you can generate lifelike facial animations synced to audio in real time, making it a powerful tool for anyone working with character animation or virtual production.
Pre-Alpha Release:
Please note, NeuroSync is currently in pre-alpha. We’re in the early stages of development, and while it’s already functional, there’s much more to come. A dedicated plugin is in the works, and we’re eager to gather feedback from the community to help shape its development to meet your needs.
Key Features:
Innovative Workflow: NeuroSync introduces a unique real-time TTSA (text-to-speech + animation) workflow, seamlessly converting audio features into facial blendshapes and emotion data.
Real-Time Performance: The model processes batches of audio feature frames and outputs corresponding face blendshapes and emotion data instantaneously.
Unreal Engine Integration: Easily integrate NeuroSync with Unreal Engine via the LiveLink API to bring your characters to life.
We’re excited to share this groundbreaking tool with you and are looking forward to your input as we continue to refine and expand NeuroSync. Your feedback will be invaluable as we develop the plugin and ensure it meets the needs of the community.
How does one apply for the Alpha? Also, is this purely an API call for Unreal or does it work locally in engine similar to OVR Lipsync? Also does it compile to all platforms?
The model is quite small so we are hoping to be able to have it as a plugin that you can package with the game as its for generating face animations on the fly and it needs to be local to be of any use to anyone.
Latency can be as low as <1.5 seconds (time to first speech) but with Elevenlabs its generally higher.
I will add a link to sign up for alpha access as soon as one is available here.
This does even better than that, it adds emotion automatically when the audio is emotive and has the full 51 blendshapes for the entire face (no neck or tongue, yet).
Additionally to this, it outputs emotion values per frame for 7 emotions you can use to amend the current animation with further detail on a slider - future models will include basic neck and upper body/arms.
If you only want the mouth shapes you can only use the mouth dimensions and just map those, of course.
We are moving fast and will add any further updates on this thread. <3
Here is a rough multiturn example (excuse any mistakes!) with a low quality speech model
We are getting some good response times with a bit of chunking trickery!
To contribute to this amazing project and to help others like me who had doubts about how to use, install, and integrate it into Unreal, I created a basic tutorial that can provide guidance for those who are unsure about how to use it. If you have any suggestions, feel free to contact me for any assistance.