Introducing NeuroSync: Real-Time 60fps Audio-to-Face Animation with Transformer-Based Seq2Seq Neural Networks

AnimaIgnis · August 22, 2024, 11:34am

With NeuroSync, you can generate lifelike facial animations synced to audio in real time, making it a powerful tool for anyone working with character animation or virtual production.

Model: AnimaVR/NEUROSYNC · Hugging Face
Player: GitHub - AnimaVR/NeuroSync_Player: The NeuroSync Player allows for real-time streaming of facial blendshapes into Unreal Engine 5 using LiveLink - enabling facial animation from audio input.
Local API: GitHub - AnimaVR/NeuroSync_Local_API: NeuroSync Audio to face animation local inference helper code.
NeuroSync YouTube Channel: https://www.youtube.com/@animaai_mai

Innovative Workflow: NeuroSync introduces a unique real-time TTSA (text-to-speech + animation) workflow, seamlessly converting audio features into facial blendshapes and emotion data.
Real-Time Performance: The model processes batches of audio feature frames and outputs corresponding face blendshapes and emotion data instantaneously.
Unreal Engine Integration: Easily integrate NeuroSync with Unreal Engine via the LiveLink API to bring your characters to life.

tomhalpin8 · August 27, 2024, 6:09pm

How does one apply for the Alpha? Also, is this purely an API call for Unreal or does it work locally in engine similar to OVR Lipsync? Also does it compile to all platforms?

AnimaIgnis · August 30, 2024, 8:42pm

The model is quite small so we are hoping to be able to have it as a plugin that you can package with the game as its for generating face animations on the fly and it needs to be local to be of any use to anyone.

Latency can be as low as <1.5 seconds (time to first speech) but with Elevenlabs its generally higher.

I will add a link to sign up for alpha access as soon as one is available here.

tomhalpin8 · September 5, 2024, 5:50pm

What about an even smaller model for just mouth shapes for talking?

AnimaIgnis · September 6, 2024, 11:13am

This does even better than that, it adds emotion automatically when the audio is emotive and has the full 51 blendshapes for the entire face (no neck or tongue, yet).

Additionally to this, it outputs emotion values per frame for 7 emotions you can use to amend the current animation with further detail on a slider - future models will include basic neck and upper body/arms.

If you only want the mouth shapes you can only use the mouth dimensions and just map those, of course.

We are moving fast and will add any further updates on this thread. <3

We are getting some good response times with a bit of chunking trickery!

tomhalpin8 · September 6, 2024, 9:36pm

Looking forward to the release of the alpha to give it a try and see if it will work in my pipeline

Abood-AK · November 8, 2024, 2:20pm

Hello. Can you please guide me on how to connect NeuroSync to the MetaHuman? I have the NeuroSync working locally (Local API). Thanks.

AnimaIgnis · November 14, 2024, 1:04am

Helloo, it’s the same as if you were using LiveLink app on an iPhone. Just enable the LiveLink and ARKit plugins in your Unreal Project and then use GitHub - AnimaVR/NeuroSync_Player: The NeuroSync Player allows for real-time streaming of facial blendshapes into Unreal Engine 5 using LiveLink - enabling facial animation from audio input. to stream into Unreal. Start with the play_generated.py and it will stream a default animation so you can make sure its all linked with LiveLink.

Some software is coming that will simplify it all soon.

Loginn404 · November 15, 2024, 4:32pm

To contribute to this amazing project and to help others like me who had doubts about how to use, install, and integrate it into Unreal, I created a basic tutorial that can provide guidance for those who are unsure about how to use it. If you have any suggestions, feel free to contact me for any assistance.

LewinWienecke · December 19, 2024, 9:29pm

Hi! It’s an awesome tool. It works locally and directly interacts with LiveLink Face after being assigned in Unreal. How did you attach idle body animations and other actions? It would be really helpful to have a sample Unreal project to see the MetaHuman configurations.

lamnguyenx4 · December 21, 2024, 2:16pm

Hi there!

I just wanted to say awesome work! I’ve been using your model with Unity, and I’ve noticed it yields much better results than what you demonstrated on YouTube. Perhaps it’s because Unity is more lightweight?

Also, do you have any plans to release a technical write-up on arXiv? I’m interested in fine-tuning this model for my specific language.

Thanks!

AnimaIgnis · December 29, 2024, 1:22pm

Hello, really pleased you are having some positive results!

It’s likely working better due to some frame skipping that happens when I record my screen and run the model + unreal project - more a slow hardware thing than a difference in model performance.

There is a new version coming in the new year with some improvments and I will make sure to go into more detail about how we create the dataset and train the model - it does work for any language too, although the accuracy for lesser known languages is untested at the moment.

I shall keep this updated

AnimaIgnis · December 29, 2024, 1:31pm

Thanks Lew! The idle body animation is just a looped animation sequence in Unreal and the player streams the face into LiveLink.

As the body and face are separate we can blend to and from a default animation so its seamless (livelink is used constantly to stream the default animation and we blend to and from its current index for each new generated animation from the input audio).

You can add a default animation for the body using Unreals animation tools, either a sequence that loops or a level sequence that your character is controlled in, you just need to enable livelink on the face on whatever model you are using and make sure you dont overide it during runtime.

Its made so that there is very little required on the Unreal side other than enabling livelink on the face, even the basic thirdperson char idle will work as a looping idle to make it feel a bit more alive.

A LookAt node set to look at the camera or a fixed point (set in the FaceAnim_BP) also helps make the eyes look less dead.

redagrandrei · January 17, 2025, 3:38pm

@Animalgnis Hello, i do have this issue at the moment

user@WINDOWS-JBQD9H0 MINGW64 /d/LLM/NeuroSyncLipSync/NeuroSync_Player (main)
$ python text_to_face.py
pygame 2.6.1 (SDL 2.28.4, Python 3.12.4)
Hello from the pygame community. Contribute - pygame wiki
Enter the text to generate speech (or ‘q’ to quit): Hello
Audio data saved to generated\e636a60d-5822-4489-8481-92e174309fca\audio.wav
Traceback (most recent call last):
** File “D:\LLM\NeuroSyncLipSync\NeuroSync_Player\text_to_face.py”, line 48, in **
** save_generated_data(audio_bytes, generated_facial_data)**
** File “D:\LLM\NeuroSyncLipSync\NeuroSync_Player\utils\api_utils.py”, line 62, in save_generated_data**
** save_generated_data_as_csv(generated_facial_data, shapes_path)**
** File “D:\LLM\NeuroSyncLipSync\NeuroSync_Player\utils\csv\save_csv.py”, line 32, in save_generated_data_as_csv**
** generated = generated.reshape(-1, total_columns)**
** ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^**
ValueError: cannot reshape array of size 2992 into shape (61)

Any ideas?

Keneyr-Yin · January 24, 2025, 3:37pm

Can it be English?

AnimaIgnis · January 29, 2025, 12:31pm

inside save_csv.py you want to set include_emotion_dimensions=True

I will fix this though, its a bug as it should account for either on playback.

The return is 68 dimensions, but to save a csv for importing to unreal rather than playback via livelink you can disable saving the emotion dimensions.

edit : Fixed

Zamera · February 10, 2025, 3:00pm

HI @Animalgnis, I tried the updated github code While I running the API code locally am facing the below error. Kindly let me know how to fix it. Running in Python 3.12.0

AnimaIgnis · February 11, 2025, 2:45pm

my bad! i have fixed it, just update your api code and it should work

Abood-AK · February 26, 2025, 8:58pm

Hello, what is the expected audio format input to the NeuroSync Player? I am trying PCM16 24kHz but I dont think it is working.

Ben_Blau · March 7, 2025, 3:23pm

IIRC, you can make a playable WAV out of the raw PCM data array.