Facial Portrait Synthesis: can I simulate a new perspective image based on 3-4 input images

In search of anyone with experience or technical expertise in photogrammetry, NeRF, or other technologies for perspective synthesis or image stitching.

Project description: I am working on a project to capture facial portraits from close up to a subject. We want to create a fixture with 3-4 cameras around the edge, pointing inwards, toward the face. It would capture simultaneous images from around each camera and then synthesize them into a new picture which appears to be straight on to the face from approximately 6 feet away. This is the optimal portrait condition.

Concerns: fine detail quality/texture accuracy, geometric accuracy, number of cameras required,

Question: Does photogrammetry/MVS seem like the most promising approach to this problem or are there other technologies or examples that would be better suited to this application? This is one example I found for perspective synthesis but it only uses one input image forcing the algorithm to “make up” a lot of data. There is also a product out there doing something similar but I haven’t been able to fully understand how they do it (they mention photogrammetry but also mention stereo vision cameras).

Thank you in advance for any insights!

This should be possible, but I could not say the type of quality that you can expect that would depend on a lot of factors, like the quality of the cameras, if you have good priors for the cameras etc etc.