Hello everyone,
I’ve been exploring Unreal Engine’s MetaHuman Speech2Face and Runtime MetaHuman LipSync systems and I’m really impressed by Epic’s work on speech-driven facial animation and lip-sync technology.
While experimenting, I noticed that the current lip-sync models perform well for English audio, but not as accurately for Chinese speech input. This seems to be due to the model’s training data being primarily in English, resulting in less precise mouth movements for Chinese phonemes.
I’d like to ask a few questions regarding the future of these systems:
-
Does Epic have any plan to release or open-source the following?
-
The training code or framework (e.g., ONNX / PyTorch implementation)?
-
The training dataset (or a subset/sample)?
-
The original model weights (beyond the inference-optimized version)?
-
-
If there are no plans to open-source the models, would Epic consider sharing:
-
Documentation about the model’s input/output format,
-
or the API-level specification for speech-to-lip feature mapping?
-
Such resources would help developers fine-tune or extend the model for other languages — especially for projects involving Mandarin or multilingual facial animation — while staying compatible with the existing MetaHuman framework.
I completely understand that there may be licensing or proprietary restrictions, but even partial documentation or standardized interfaces would be extremely valuable to the community.
Thank you for all the incredible work Epic has done in advancing MetaHuman technology.
Looking forward to your response!