I doubt you need arkit or anything else to generate animations.
You can copy poses from a skeletal setup to morph targets.
The process would involve posing the bones manually into an animation, and then recording the output for the various 52 morph targets you need to make the face do stuff.
Naturally, making facial animations without a mocap setup is hell, but its been done for ages before we invented better solutions…
As someone who hates apple, I have to say that the iphone 11 I specifically got for face mocap is great at it.
Probably better than the rest of the mocap equipment that costs 60 times as much.
Though not as good as face markers and multi camera setups, the fact you fire it up and get it working in 10 seconds is where the value is at.
If you want to animate a mesh you have to learn how to rig and generate morphs either way, in fact, if anyone here doesnt know how its probably metahuman’s fault.
If they did not exist, you’d have already learned how to make it happen…