Download

Convincing text to voice in games... Possible?

So, this is way out of my field of learning, so bear with me here but… Hypothetically speaking, for use in a role-playing game for example, is it possible to have some sort of plug-in or program that would convert a players written dialogue into speech in convincing and interesting ways (I.e. Not like Apple’s siri!).

Something like that would be cool, especially if you could customise their voice and even alter speech patterns with emotion (switching to angry or pained).

I honestly have no idea about audio engineering, but if someone else does, please do share.

Not yet, there’s nothing that works well enough out there.

True? That’s a shame. But does anyone who works in sound have an idea of how difficult something like this would be to produce?

If it was easy they’d probably have done it by now. At the moment the fully generated systems sound terrible, and the ones that use voice samples sound wrong because they just put words together.

There are plenty of text to speech programs. The problem would be getting their source(paying for a license) and meshing them with Unreal. And then the fact that you will probably only have a handful of voices to work with, and probably only in English. Then finally you would have a bunch of terrible sounding robots in your game.

I think it would be an extremely interesting game, but you would literally have to build the theme of your game around how bad the voices sound. You could make a horror game where all the character models fit right in the uncanny valley, with half-human/half-robot text to speech.:stuck_out_tongue:

Anyway, I think it is possible and I hope we are headed there someday, but right now it is going to be expensive, extremely difficult, and the product is going to be pretty bad for most games.

I’m interested in Text-to-Speech as well for a Verbal Dialogue System. Perhaps the current tech would be suitable in a Sci-fi setting with Robots, Machines, Aliens, etc in which the synthesized voice is expected/acceptable.

In UE3 there was a TTS engine which, if you google it, it worked pretty good, however is not implemented in UE4, but you can implement the microsoft TTS using C++, and I stop explaining here because when it comes to scripting I don’t know what I’m tallking about :smiley:

Anyway there are already a couple of solutions which will be available in the near future.
FacePlus for Unity I think that its in development for UE4
Faceshift uses TCP streaming in order to send data to Maya/Motion Builder, and since every subscriber has the source code for UE4 I guess it’ll be a matter of time until someone will start developing a dedicated plugin.
Faceware is developing a realtime dedicated streaming performance capture for UE4…

You’re asking about converting text to speech, but in a role play game it’ll be better to have audio converted to phonemes by analyzing the audio itself, then convert it to text to be displayed on screen…
Consider that there is an already available ( free ) plugin for audio visualization for UE4, so I guess that you could convert the waveform to phonemes using blueprints or C++

This will be possible but is really time consuming as you have to set the indicidual visemes of your character morph targets based on each letter or group of letters in a text.
So you can create all sorts of possible letter combination of human speech , e.g , if you write the word in your dialogue “Phonetics” then your program , whether blueprint or C++ code that receives this text input , will break the word down into combinations -> P H OW D Eh D Ih Sh Z , based on the following simple table , which you have to create in an excel and pass it in your C++ for mapping algorithm

a57bfe53a7d9fe24d7e7079f0a5d7f580a6629e6.jpeg

Once you create the function to break down your sentence into common group of Phonemes , save the group in an Array and use it as a output parameter , then set your visemes morph targets of your character accordingly, ie , for every string of the array , you are setting the respective morph target. This is not as easy as it sounds, but I would be glad if you give it a try.
http://ict.usc.edu/pubs/A%20Practical%20and%20Configurable%20Lip%20Sync%20Method%20for%20Games.pdf

Thanks for your replies, everyone. I’ll maybe come back to this line of thought a little later down the road.