Just look into ArKit if you are going to do it with an iphone.
The data you can transmit is driven by 52 shapes.
You create an iphone app that uses arkit, and transmits the data to something else.
That someting interprets said data however you need it to.
If you don’t want to create the cusrom ArKit app, then you have to make a 2d character that uses blend shape data to animate.
Southpark animations aren’t exactly smooth, and also rather custom when it comes to different expressions.
I’m not sure where you are wanting to go overall, but there will likely be some impossible scenarios, like distinguishing angry face vs squinting.
Because of the 52 shapes, you are just going to have a hard time.
Because of mixed value data (you rarely get a full 1 on a single shape) you’ll often get just “bad” results.
You can try to use blendshapes to shift parts of the face up and down to hide/clip and present the different expressions. That too is very hit/miss without some sort of clamp range on the incoming data…