Convincing text to voice in games... Possible?

anonymous_user_3765c580 · November 25, 2014, 9:46pm

So, this is way out of my field of learning, so bear with me here but… Hypothetically speaking, for use in a role-playing game for example, is it possible to have some sort of plug-in or program that would convert a players written dialogue into speech in convincing and interesting ways (I.e. Not like Apple’s siri!).

Something like that would be cool, especially if you could customise their voice and even alter speech patterns with emotion (switching to angry or pained).

I honestly have no idea about audio engineering, but if someone else does, please do share.

darthviper107 · November 25, 2014, 10:23pm

Not yet, there’s nothing that works well enough out there.

anonymous_user_3765c580 · November 25, 2014, 10:48pm

True? That’s a shame. But does anyone who works in sound have an idea of how difficult something like this would be to produce?

darthviper107 · November 25, 2014, 11:16pm

If it was easy they’d probably have done it by now. At the moment the fully generated systems sound terrible, and the ones that use voice samples sound wrong because they just put words together.

Zeustiak · November 27, 2014, 1:38am

There are plenty of text to speech programs. The problem would be getting their source(paying for a license) and meshing them with Unreal. And then the fact that you will probably only have a handful of voices to work with, and probably only in English. Then finally you would have a bunch of terrible sounding robots in your game.

I think it would be an extremely interesting game, but you would literally have to build the theme of your game around how bad the voices sound. You could make a horror game where all the character models fit right in the uncanny valley, with half-human/half-robot text to speech.

Anyway, I think it is possible and I hope we are headed there someday, but right now it is going to be expensive, extremely difficult, and the product is going to be pretty bad for most games.

TechLord · November 27, 2014, 3:53am

I’m interested in Text-to-Speech as well for a Verbal Dialogue System. Perhaps the current tech would be suitable in a Sci-fi setting with Robots, Machines, Aliens, etc in which the synthesized voice is expected/acceptable.

Enter_Reality · November 27, 2014, 7:51am

In UE3 there was a TTS engine which, if you google it, it worked pretty good, however is not implemented in UE4, but you can implement the microsoft TTS using C++, and I stop explaining here because when it comes to scripting I don’t know what I’m tallking about

Anyway there are already a couple of solutions which will be available in the near future.
FacePlus for Unity I think that its in development for UE4
Faceshift uses TCP streaming in order to send data to Maya/Motion Builder, and since every subscriber has the source code for UE4 I guess it’ll be a matter of time until someone will start developing a dedicated plugin.
Faceware is developing a realtime dedicated streaming performance capture for UE4…

You’re asking about converting text to speech, but in a role play game it’ll be better to have audio converted to phonemes by analyzing the audio itself, then convert it to text to be displayed on screen…
Consider that there is an already available ( free ) plugin for audio visualization for UE4, so I guess that you could convert the waveform to phonemes using blueprints or C++

sameek4 · November 28, 2014, 8:38pm

This will be possible but is really time consuming as you have to set the indicidual visemes of your character morph targets based on each letter or group of letters in a text.
So you can create all sorts of possible letter combination of human speech , e.g , if you write the word in your dialogue “Phonetics” then your program , whether blueprint or C++ code that receives this text input , will break the word down into combinations -> P H OW D Eh D Ih Sh Z , based on the following simple table , which you have to create in an excel and pass it in your C++ for mapping algorithm

Once you create the function to break down your sentence into common group of Phonemes , save the group in an Array and use it as a output parameter , then set your visemes morph targets of your character accordingly, ie , for every string of the array , you are setting the respective morph target. This is not as easy as it sounds, but I would be glad if you give it a try.
http://ict.usc.edu/pubs/A%20Practical%20and%20Configurable%20Lip%20Sync%20Method%20for%20Games.pdf

anonymous_user_3765c580 · December 2, 2014, 4:11am

Thanks for your replies, everyone. I’ll maybe come back to this line of thought a little later down the road.