Introducing NeuroSync: Real-Time 60fps Audio-to-Face Animation with Transformer-Based Seq2Seq Neural Networks

I’m thrilled to introduce NeuroSync, a cutting-edge transformer-based seq2seq neural network designed to create real-time face blendshape animations directly from audio features. This is a brand-new workflow that is being developed specifically for Unreal Engine. With NeuroSync, you can generate lifelike facial animations synced to audio in real time, making it a powerful tool for anyone working with character animation or virtual production.

VISIT MAI ON TWITCH

Pre-Alpha Release:
Please note, NeuroSync is currently in pre-alpha. We’re in the early stages of development, and while it’s already functional, there’s much more to come. A dedicated plugin is in the works, and we’re eager to gather feedback from the community to help shape its development to meet your needs.

Key Features:

  • Innovative Workflow: NeuroSync introduces a unique real-time TTSA (text-to-speech + animation) workflow, seamlessly converting audio features into facial blendshapes and emotion data.
  • Real-Time Performance: The model processes batches of audio feature frames and outputs corresponding face blendshapes and emotion data instantaneously.
  • Unreal Engine Integration: Easily integrate NeuroSync with Unreal Engine via the LiveLink API to bring your characters to life.

We’re excited to share this groundbreaking tool with you and are looking forward to your input as we continue to refine and expand NeuroSync. Your feedback will be invaluable as we develop the plugin and ensure it meets the needs of the community.

1 Like

How does one apply for the Alpha? Also, is this purely an API call for Unreal or does it work locally in engine similar to OVR Lipsync? Also does it compile to all platforms?

The model is quite small so we are hoping to be able to have it as a plugin that you can package with the game as its for generating face animations on the fly and it needs to be local to be of any use to anyone.

Latency can be as low as <1.5 seconds (time to first speech) but with Elevenlabs its generally higher.

I will add a link to sign up for alpha access as soon as one is available here. :slight_smile:

1 Like

What about an even smaller model for just mouth shapes for talking? :slight_smile:

This does even better than that, it adds emotion automatically when the audio is emotive and has the full 51 blendshapes for the entire face (no neck or tongue, yet).

Additionally to this, it outputs emotion values per frame for 7 emotions you can use to amend the current animation with further detail on a slider - future models will include basic neck and upper body/arms.

If you only want the mouth shapes you can only use the mouth dimensions and just map those, of course.

We are moving fast and will add any further updates on this thread. <3

We are getting some good response times with a bit of chunking trickery! :slight_smile:

1 Like

Looking forward to the release of the alpha to give it a try and see if it will work in my pipeline :slight_smile:

Come get some!

Hello. Can you please guide me on how to connect NeuroSync to the MetaHuman? I have the NeuroSync working locally (Local API). Thanks.

Helloo, it’s the same as if you were using LiveLink app on an iPhone. Just enable the LiveLink and ARKit plugins in your Unreal Project and then use GitHub - AnimaVR/NeuroSync_Player: The NeuroSync Player allows for real-time streaming of facial blendshapes into Unreal Engine 5 using LiveLink - enabling facial animation from audio input. to stream into Unreal. Start with the play_generated.py and it will stream a default animation so you can make sure its all linked with LiveLink.

Some software is coming that will simplify it all soon. :slight_smile:

To contribute to this amazing project and to help others like me who had doubts about how to use, install, and integrate it into Unreal, I created a basic tutorial that can provide guidance for those who are unsure about how to use it. If you have any suggestions, feel free to contact me for any assistance.

1 Like

Hi! It’s an awesome tool. It works locally and directly interacts with LiveLink Face after being assigned in Unreal. How did you attach idle body animations and other actions? It would be really helpful to have a sample Unreal project to see the MetaHuman configurations.

Hi there!

I just wanted to say awesome work! I’ve been using your model with Unity, and I’ve noticed it yields much better results than what you demonstrated on YouTube. Perhaps it’s because Unity is more lightweight?

Also, do you have any plans to release a technical write-up on arXiv? I’m interested in fine-tuning this model for my specific language.

Thanks!

Hello, really pleased you are having some positive results!

It’s likely working better due to some frame skipping that happens when I record my screen and run the model + unreal project - more a slow hardware thing than a difference in model performance.

There is a new version coming in the new year with some improvments and I will make sure to go into more detail about how we create the dataset and train the model - it does work for any language too, although the accuracy for lesser known languages is untested at the moment.

I shall keep this updated :slight_smile:

Thanks Lew! The idle body animation is just a looped animation sequence in Unreal and the player streams the face into LiveLink.

As the body and face are separate we can blend to and from a default animation so its seamless (livelink is used constantly to stream the default animation and we blend to and from its current index for each new generated animation from the input audio).

You can add a default animation for the body using Unreals animation tools, either a sequence that loops or a level sequence that your character is controlled in, you just need to enable livelink on the face on whatever model you are using and make sure you dont overide it during runtime.

Its made so that there is very little required on the Unreal side other than enabling livelink on the face, even the basic thirdperson char idle will work as a looping idle to make it feel a bit more alive.

A LookAt node set to look at the camera or a fixed point (set in the FaceAnim_BP) also helps make the eyes look less dead.

@Animalgnis Hello, i do have this issue at the moment

user@WINDOWS-JBQD9H0 MINGW64 /d/LLM/NeuroSyncLipSync/NeuroSync_Player (main)
$ python text_to_face.py
pygame 2.6.1 (SDL 2.28.4, Python 3.12.4)
Hello from the pygame community. Contribute - pygame wiki
Enter the text to generate speech (or ‘q’ to quit): Hello
Audio data saved to generated\e636a60d-5822-4489-8481-92e174309fca\audio.wav
Traceback (most recent call last):
** File “D:\LLM\NeuroSyncLipSync\NeuroSync_Player\text_to_face.py”, line 48, in **
** save_generated_data(audio_bytes, generated_facial_data)**
** File “D:\LLM\NeuroSyncLipSync\NeuroSync_Player\utils\api_utils.py”, line 62, in save_generated_data**
** save_generated_data_as_csv(generated_facial_data, shapes_path)**
** File “D:\LLM\NeuroSyncLipSync\NeuroSync_Player\utils\csv\save_csv.py”, line 32, in save_generated_data_as_csv**
** generated = generated.reshape(-1, total_columns)**
** ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^**
ValueError: cannot reshape array of size 2992 into shape (61)


Any ideas?