Ideal toolset and pipeline for solo short film and video game dev to digitize original physical BJD characters into performance capture-capable assets?

My wife hand-sculpts original ball-jointed doll characters of her own conceptualization and design, and then casts their parts to produce resin doll clones of the original sculpt. I want to digitize these characters and bring them to life using Unreal Engine.
Nearly all the learning resources out there show how to do a workflow from scratch or from a character creation software where the character is digital from the beginning, not a high-resolution, high-polycount 3D scan of the parts of a real-world poseable figurine.

I’ve gotten as far as reducing the polycount, fixing scan errors, and reassembling the doll, even rigging its body for skeletal animation successfully.

However, the character has no eyes, no clothes, and no facial animation rig. Yesterday I tried figuring out how to make a space inside their mouth so they can open it to talk once I have a facial animation rig set up.

I have little to no budget for software at the moment so I’m using Blender and Unreal Engine.

To make the characters work well for making short movies, and also for putting them in games, I need guidance on how to get this set up.

I want to make the content creation process as painless as possible, so I hope to be able to do facial performance capture of whoever is voice acting the character and use Unreal’s quick iteration and render times to our advantage. This way I don’t have to hand-animate the lip sync and facial expressions, but capture the actors’ performances instead and have them show up on the doll character immediately.

She has had good results for short, 7-12 second video clips using MidJourney AI-generated video of her doll still photos. The results are amazing and look photorealistic! But with a 3D character rigged and ready to go, we can have direct control over the results and won’t have to spend all day iterating over AI prompts and stitching the results together to achieve CLOSE to the thing we want.
With a rigged character and performance capture we could quickly create the exact performance we want, and maybe combine that with other AI-generated elements in the final composition (match lighting when it matters, of course).

I’m a rookie at this stuff but I understand the concepts. I just anticipate a lot of roadblocks and caveats where Unreal does a lot of what I need but not everything, so I have to use a DCC app like Blender to get those things set up, and then might want to just render in Blender anyway if it takes too much figuring out to get it to play nicely with Unreal. Maybe there are other free/cheap tools that would be a better pipeline than just Blender > Unreal. Anyone have any ideas?

The issue I’m running into with learning this stuff is that all the live performance capture learning resources assume using Metahumans. My characters are not generated by Metahumans (they’re created the “hard” way instead), so how can I get Unreal to work on them for the purposes I’ve stated? Should I use blend shapes or face bones?

I want to save myself a lot of wasted hours going down rabbit holes that don’t lead to my goal as stated. Can you help me?