The term of art for this is called facial or performance capture, depending on the scope of captured movement. There should be tooling available for most 3D software packages that will support doing this to some degree.
The quality of the capture is largely based on the types of data being processed. Stereo video with painted tracking markers on the face is still – at least as far as I’m aware – the gold standard. The LiveLink app uses depth information in addition to the video to increase quality over what would be possible with just video alone.
In practice, that just means that there may be more cleanup involved with some techniques than others.
There should be a number of assets on various stores that support Apple’s ARKit blendshapes, which UE has very solid support for. Failing that, the basis of ARKit’s blendshapes is the standardised facial action coding system (FACS), so most any asset that adheres to that system should work.