OVRLipSync Plugin for UE4

I wouldn’t worry about this on PC/consoles, but on mobile it would be an issue.

How would you record viseme values ? Planning on adding a small tool for that? :wink:

The plugin spits out the values, you’d just need to have an array or whatever and add an element every time (in Blueprint for instance). Then you can store it in a datatable/csv or save game or whatnot.

Hi [mention]n00854180[/mention]! First of all, you did a great job! This is amazing! I am currently working on Allright Rig - rigging and animation system for Unreal Engine. I also have a lipsync feature in my roadmap and I have just started to do some research on this question. As you are not new to this problem, I have some ideas about your plugin if you would be interested in them.

In my opinion, right now there is no better solution for auto lip-sync then FaceFX algorithm. In my opinion, there is no way to get a good quality lip-sync just from audio, especially in realtime. But if you mix audio source with text input you can get nice quality results in seconds. I have functions, that can bake morph targets animation into animation sequence or to my rig. So, I suggest to work on this feature together and to make a good plugin, that will take an audio and text source, synchronize them, extract phonemes and will bake the result into animation sequence, or even better right into my face rig controllers. Actually it doesn’t meter where to save resulting data, but using Allright Rig rig user could edit results easily. Plugin can also just save resulting data to some asset or structure like: frame XX, phoneme XX. And I think new UE 4.15 audio curves feature could be very helpful. After all, user will have an animation sequence with good quality lip-sync for use in game. I am not suggesting to help my system, but to make a cool feature for Unreal Engine. What do you think about it? Thank you.

Sounds like a good idea to me! Hit me up on Steam (this user name) as I’m always online there and it will be easier to coordinate.

What sort of plan did you have in mind? Right now the OVRLipSync plugin basically just works how it does in Unity - if you give it a mic input sound sample it spits out morph values for each of the common phonemes. It doesn’t currently support feeding it a canned audio clip, though I’d like to get that working in order to update my StorytellerVR program anyhow.

So that would have to be first - my approach was going to be to rip out some of the code I wrote for decompressing SoundWaves at runtime in a separate thread (it’s in an old version of eXi’s SoundVis plugin) and then set up the OVRLipSync plugin to operate on the sound data there.

I don’t know much about how you’d take that to the next level and do baking of morphs or using the text to improve the lip sync or anything, but if you want I can hand you off a version of the OVRLipSync (when I’ve finished writing it) that will take in SoundWaves.

Hi! Glad to hear it! To be honest, I am not a gamer and I don’t even have steam account. Is skype (aleks_allright) comfortable for you? However it is also not a problem to make steam account for me.

There is no need to compress any data at runtime. This system needs to process the audio at once and output results like: time - phoneme. Results from good quality audio should be much better than from microphone, and you can make few iterations as we don’t care about program speed right now. Then I need to extract phonemes from text (currently trying to find good solution) and synchronize results. I am not sure how should it work exactly, but it would be cool to start from a version of the OVRLipSync that will take in SoundWaves.

Is 15 phonemes maximum amount of phonemes that can be extracted using OVRLipSync?

@aleks_allright OVRLipSync (as per Oculus SDK) was designed for real-time performance, no baking. The idea was that you speak into mic and your avatar moves lips. It might not be 100% accurate, but for VR it’s extremely efficient. It would be nice to feed audio files into it too, so that NPCs talk to players as if they are other players. Making it all into pre-baked solution defeats the purpose of the system.

Hi @motorsep! I know that it was made for real-time performance. Nobody is going to change it. But right now there is no better solution for phonemes extraction in Unreal Engine. CMUSphinx, has worse phonemes extraction. Annosoft Lipsinc SDK costs 8K. Right now I am trying to make Microsoft SAPI to work. It would be perfect to have algorithm that was made for baking, but I don’t know where to find it :confused: And I think it will not give much better results, but it could output more than one phoneme at exact time. However, text to phonemes can fix it. I need phonemes from audio only to fix timing of text recognition.

So, why not to keep it real-time?

For the update to support passing it a SoundWave, I’ll be supporting real time and an output version you can just save on disk. No conflict there :slight_smile:

Yep, no conflict :slight_smile: I also say this.

Progress update: I made went through and ported my old FAudioDecompressWorker that had support for partial decompress to the newest version of eXi’s SoundVis plugin. I’ll be pulling that class into my OVRLipSync repo soon and hooking it in.

Once it’s working, you’ll be able to pass a SoundWave to the plugin and it will decompress and send the raw sound values for that duration to the OVRLipSync functions, which will spit out viseme values.

Sounds great!!! :slight_smile: I am exited to try it!

Ok, I have made Microsoft SAPI to work. Now I will work with it. So, probably I don’t need OVR phonemes extraction anymore. But anyways it would be cool to have the possibility of working with SoundWaves in OVR!

This is a really cool project. Will definitely come in handy for what I’m currently working on :)!

Hi is there any update to adding audio files and having NPC’s talking and lip syncing?

Still working on it @tcla75 - I have most of the code done, just gotta test things and maybe rewrite some of it to be better laid out.

Any news on this?

It works for mic input right now but I haven’t messed with the canned file (or runtime soundwave) support in a while. I’m currently doing some related work on the project I’m working on so I might get to it soon-ish.

Hi, quick question. Can this plugin transform text to visemes? Not audio but just simple string.

Sadly no, it doesn’t do text to visemes. There are some free tools you could use to do that though, if you only need it to be offline.