Your thoughts on where to find AAA facial mocap solutions (software mainly)?

Hi folks,

I’m getting lost in the search for a AAA facial mocap software. Each and every product that I found and once was an industry leader was recently bought up by giants like Apple, Ubisoft and so on, also Epic snatched some away :worried:

The one I am currently evaluating is Faceware Analyzer + Retargeter, but I’m not sure if these will survive the next year. Last software update is of March 2018 … Besides they got the well known Faceware Studio, which, in my opinion, is more for playing around.

I would like to know of alternatives that I could take a look into. Of course the company and the product should be in a healthy state. I don’t like to spend thousands on a dead horse. For example I found some places that still sell the OptiTrack Expression, which even OptiTrack has removed from their website years ago … Quality is my main concern. Tech is of secondary concern (markers + cams, software only from 2D input, etc.)

Any ideas out there?

I suppose it depends on what part of the flow you’re looking at?

For instance, Reallusion iClone has support for Unreal Live Link, and then can take pre-made animations, hand-animations, animations created via their ‘facial puppet’ functionality, from iOS ARKit captures using their own facial motion capture, via Faceware (which you mentioned), from lip movement generated from audio input and some contextual viseme data you provide, etc.

For purposes of sending it into Unreal via Live Link, how that animation got into iClone is irrelevant.

Now, I don’t know whether iClone itself is what you’d consider AAA software in that area, mind you. It sometimes feels to me a bit like a product trying very much to be The One True Animation Solution for everyone, which means it might fall short in specific areas if you really dive deep; I’ve not been worrying about facial mocap enough to have really put it through its paces there in any meaningful way. (Though I’ve gotten very good results in my admittedly limited experimentation using the iOS ARKit input. Mount a phone on a helmet and off you go!)

But one reason I like it as a solution – aside from covering both facial and overall body animation, thus one less tool for me to buy, which is something of a consideration for me when my game is very much a “this is what I work on when I’m not doing embedded firmware for a paycheck” scenario – is that I could generate animation in it from a wide variety of inputs, and it doesn’t matter where it came from in terms of how I send it over to Unreal.

Obviously, if you already have a solution filling the “this is where the animations collect and get tweaked, and from here it goes into Unreal” role and are looking only for specifically the facial mocap input you feed into that existing solution, that’s probably somewhat less useful to you.

(Though, again, I admit I’ve been pleasantly surprised by the quality of its native iOS ARKit-based captures. But that’s admittedly likely to be more a factor of iOS’s native facial motion capture being startlingly good than an aspect of iClone itself.)

I can recommend Reallusion programs in this regard.

Unreal Live link is free for indie users.

https://www.reallusion.com/store/product.html?l=1&p=ic#2226

Yes, that is what I am looking for, a solution for recording and retrieving facial mocap data in the highest possible quality (meaning precise, jitter-free, robust, … data) I can afford so that there is very limited to no need for cleaning up the results before doing what I’d call post processing. I am thinking of systems that one could use for movie production with face closeups. Of course I can’t afford high end systems that Hollywood productions use, but I’m quite sure there are solutions between smartphone capture and the top of the line.

So far I didn’t want to go the iPhone way because of 2 things:

  • The results I’ve spotted on YouTube and elsewhere look like a rubber toy trying to mimic a human, what indeed is what they do, but this is far from AAA quality, so I was convinced that this is not a good starting point.
  • I found some reports on thermal issues when using iPhone’s TrueDepth camera for a longer period of mocap recording. They say that after some minutes the camera throttles FPS automatically what makes it impossible to take longer shots. Cooldown is needed afterwards so lots of time is lost while waiting for that.

But I just took a closer look at reallusion’s website and found an example video showcasing the latest improvements in iPhone capturing which look quite decent. And luckily I got my hands on a free iPhone (delivery will take some time) so I think I’ll give iClone a try, in this case it is completely free, so why not.

But please feel free to point me at other systems that are available on the market, because, as I mentioned, I can’t really find anything useful.

On your two points…

To the first point while I’ve not done that much with TrueDepth for facial mocap, I’ve done a fair amount with it for machine learning (trying to train a model to recognize emotion through expression), and based on spending (way, way, way) too much time in TrueDepth-generated data sets, I suspect the “plastic” issue is more an issue with how things utilize TrueDepth data and less the data itself.

The second, however, can be a legitimate problem, though it’s gotten a great deal better in more recent revisions of the iPhone. While gathering stuff for the ML training set (where a series of co-workers came through and did the various test expressions), we did have several iPhones and swapped them out periodically due to the thermal issue. (Now, we could have likely gone longer than we did before we strictly needed to do that, but we figured hey, no reason to push it.)

(Though now I’m a little curious myself how well the facial mocap works in practice rather than theory. Maybe tomorrow I’ll do a quick trial run of it! I’ve got both iClone and iPhone…)

Here’s a link to the video from reallusion showcasing the improvements, if someone is interested. On the left side is a typical capture what I call rubber toy, it simply looks a bit strange. The improvements on the right side look waaaay better and natural. Even if this is also not AAA this is absolutely worth a closer look. Hope that this is not only achieved through hours of manual corrections …

Improvements in iPhone face capture by reallusion

FWIW, since I have iClone and iPhone both (and an innate curiosity), I just set up Motion LIVE in iClone and grabbed the LIVE Face app on iOS. Once that was done, I did a really quick test of the iPhone facial motion capture via WiFi (which is hands-down the least ideal solution in terms of data throughput).

The out-of-box settings were… not great, I’ll admit.

However, when I turned on the natural smoothing (just on the default settings) and turned expression strength down to around 85-90%, the facial reproduction seemed spot-on for at least a cursory check; no jitter, no weird exaggerations, even over WiFi. And that was literally with checking one box (“Enable Smoothing”) and adjusting one slider (“Global Expression Strength”).

And whoa do they have a lot of controls for very precisely tweaking the way the capture maps to character. If, for instance, your own eyebrow expressiveness is not a good match to the character face, you can adjust the sensitivity of those individual points separate from everything else, among a plethora of other options. I only tinkered with it for about five minutes (because it is getting on towards stupid-o’clock here and I’ve got an early meeting and ought to crash), but even by the end of those five minutes I’d gone from “yikes” to “dang, that animation looks good.”

So overall, given even that brief experience, I suspect that getting an ideal capture is less a matter of “hours of manual corrections” and more “I need to find the right settings to map this capture performance to this character face as precisely as I desire, save that preset so I can use it again later, and then we’re good to go.”

If I can find my Lightning-to-Ethernet adapter tomorrow (I don’t want to go conduct archaeology in the Drawer of Rarely-Used Technology tonight, and also as previously mentioned: sleep time now), I’ll try to record an actual sample under the more ideal recommended conditions. Since it evidently does slightly throttle the sample rate over WiFi to ensure the performance can remain real-time if needed.

Brief addendum separate from any iClone stuff: here’s one more option that I literally just encountered this morning via an “augmented reality technology” newsletter. It’s called FaceGood (linking a YouTube of the thing) – it’s a dedicated helmet-and-camera solution, and seems like it might be a pretty solid solution.

However, it’s a Chinese company, and their website does not seem to have much of an English support presence, so you would probably be flying without much of a safety net.

@SimBim - Here’s a really hasty little test of the facial motion capture in iClone, with only a couple of settings changed.

This was not under ideal circumstances – the only place I could find to mount the phone (rather than holding it) was a bit farther away than would be recommended, and that positioning caused there to be light glare on my glasses from a window.

I’m still pretty impressed. With proper positioning and some time/attention to tune the capture parameters to map my face more closely to Alex’s, I suspect I could get something truly seamless. (Alex being my game’s protagonist there, doing freelance work as a test dummy since she happens to be rigged in iClone to start with.)

(I didn’t bother to drive this in Unreal directly, because it felt like that might be more effort than was worth setting up for a quick test; I just rendered Alex right out of iClone with that recorded expression track.)

I marked this as the answer, because that was the hint I was looking for. This video lead me to a better understanding of what to search for. The point seems to be to look out for keywords “4D mocap and/or hmc”. I hate doing days of research not knowing the terms to search for :grin: This way I also came across DI4D and I am sure that I will now be able to dig up others as well. Thanks :slightly_smiling_face:

Well to add to the conversation I’ve been looking at lip sync solutions for a while beyond the first impression result as to the things that requires consideration and more so as to procedural type solutions.

My considerations is something easy to use,low cost, and can be used with all rigging solutions with out bias as a complete solution as to ease of use. Most solutions today still require the involvement of editing and cleaning of motion capture data, which is typical of motion capture data in general, along with the need for a performance artist in which to derive the capture data from if your digital character is required to “act” .

My conclusion is hardware solutions are still expensive and difficult to set up for the purpose of a video game that does not have the same requirements of resolution or fidelity as to the line between the need for a video game and crossing over the valley as to the needs of an animated video production.

This got me to thinking that the more elegant solution was a result that is procedural driven that would be accepted in your typical video game and attempts have been made in AAA ,cough cough Mass Effect Andromeda, with less than desirable results due more to the lack of effort.

To that end I’ve been looking at audio driven solutions like the voice device available in Motionbuilder which makes use of the control rig based solutions which seems to becoming popular in Unreal 4. This leads me to the idea that Epic will one day introduce a component that would only require the user to set up the base requirements and plug in an audio clip and off it goes. This seems more of a logical approach where a close edit environment like Unreal 4 could and should be responsible for the heavy lifting in an engine where things should just work.

As proof of concept I did the following using Genesis 3 and the voice capture to drive the key frames using the voice device and a relationship constraint which is 100% driven as a procedural via an audio clip. Less than perfect but at the time I was more interested to see if the solution would work and not so much shooting for perfection.

You can make live link face, AAA by using the product below. Of course it takes a little while to ‘train it’ so it matches your face.