Google TextToSpeech (WaveNet) Plugin

The Google TextToSpeech integration plugin has just been released on the Marketplace! You can find it here.

This plugin integrates with Google’s TextToSpeech API, letting you play synthetic voices inside your games using groundbreaking research in speech synthesis and Google’s powerful neural networks to deliver high-fidelity audio.

You need to subscribe to Google’s TextToSpeech API’s to use this plugin. You can do that here:

Supports Unreal Engine 4.20-4.22. Can be tested on more versions upon request.

Example Project:
(Please Note: There’s a reverb volume in the demo, the echo is not present if you don’t want it to be)


  • Create TextToSpeech files. You specify the text and the properties of the voice in the editor, and then you can play this synthetic voice just like a regular sound wave. Perfect for prototyping characters and interactions!
  • Make Blueprint calls to the API, playing voices dynamically. NOTE: This type of playback requires the API key to be present in the build, and the client has to be online. I recommend against using this feature for public builds, for safety reasons. If your game is online and you control the servers, then you could theoretically use the plugin on the server only and send the voices to the clients. But this use case is not supported out of the box in the plugin.

[USER=“2626”]Stefan Lundmark[/USER] This is awesome and I’ve been waiting for a plugin like this. I really really want to invest in this. But…

I can see the cost to use the Google voice services getting expensive in games. I believe these cost could be minimize by caching TexttoSpeech request/response to local or remote storage for reuse. Is this what is meant by Feature #1 ?

Cloud Text-to-Speech pricing High-Fidelity Speech Synthesis
Cloud Text-to-Speech is priced per 1 million characters of text processed after a 1 million character free tier. For details, please see our pricing guide.[TABLE]


Standard (non-WaveNet) voices 0 to 4 million characters
$4.00 USD / 1 million characters

WaveNet voices 0 to 1 million characters
$16.00 USD / 1 million characters

Hi TechLord,

Exactly! I mention it briefly in the documentation as an advantage of assets. When you save the asset for the first time, the PCM (audio) data is fetched from Google. That’s one request. No request will ever be made again for this asset as long as you don’t change its properties. It will remain like a SoundWave on disk, no difference!

Hello [USER=“2626”]Stefan Lundmark[/USER],

I appreciate the prompt response. Thats fantastic news. I noticed a question was asked about Replication. Replication should not be difficult for a user to implement. It could use the same network design pattern ‘Replicate to All’, common for replicating any Event. However, if you included the blueprints to achieve this with the package, you could list Replication as a feature:)

Hi again @TechLord

Where was this question asked? I can’t see it anywhere.
Thanks for the suggestion and for taking the time, I really appreciate that! :slight_smile:

Replication is different from RPC’s (which your image is showing) so that’s why I didn’t want to claim to support that.

I see it this way:

  • I consider TextToSpeech assets as the main feature of the plugin. They’re cost-effective, fast and easy to use. You use them just like the regular SoundWaves.
  • I don’t recommend the use case which your image shows, that would potentially be very unsafe as the ApiKey would have to be on the client. This is only recommended for internal builds.
  • Generating the voice on the server and sending it to the client is an option, but UE4 is known to be a pain when sending large amounts of data across a network connection.

Sorry, that question was asked on the marketplace.

My intention is to use the plugin in multiplayer games as I’m developing a Multiplayer RPG. What is the recommended use case? How can I put the Apikey on the Client safely?

No worries. It appears questions can be deleted, because some that I’ve answered are now gone. :slight_smile:

I’m afraid you can’t put it safely on a client, ever. But do you really need dynamic voices? Can’t you keep them as TextToSpeech assets?

Yes. It is highly desired. In our RPGthe Game Masters can create new campaigns, and narrative for NPC in realtime or near realtime as they roleplay multiple characters. We were just going to use text for this. But, AI Voice will push this concept to a futuristic level. I want this feature bad, willing to go the unsafe route if necessary. Is there any options to provide this functionality safely?

Cool game and I agree, that would be awesome!

As for options, you could:

  • Keep the ApiKey on the server, since your game is an MMO where you supposedly control the servers.
  • You use the plugin on the server, and then send the PCM data with RPC to the clients.

I considered adding this but it’s a little complicated with UE4’s networking when dialogs are longer and the PCM is going to be bigger, and I didn’t want to increase complexity.

please share the list of languages available in plugin

[USER=“2626”]Stefan Lundmark[/USER] Would it be possible at all to make your plugin work with Oculus LipSync plugin…ipsync-unreal/ ?

@motorsep The plugin doesn’t support this out of the box, cool feature though! I haven’t seen it before. Is it connected to the Oculus headset in any way or can I use it without? I might add support for it in the plugin if it’s simple enough to try.

I believe it can be used without Oculus HMD, since plugin can take audio files and use it to drive lips of NPCs. I just don’t know whether it can work with launcher version of UE4 or if it only works with Oculus fork of UE4 (which I’ve been using since like 4.19 exclusively).

Okay, cool. Lipsync is an interesting topic and it’s a problem I would like to solve eventually.

If I get some free time next week I might be able to look into it. No promises though. I’ll post an update here if I do.

Much appreciated !

@motorsep Sorry man, haven’t had the time to look at this yet. I thought about it though, if all the LipSync stuff needs is the PCM data then this can be done with the plugin if you know some C++. You just need to expose the buffer.

Let me know if you have any example levels or anything you can send my way. As long as there’s an example of when LipSync is working, then I can figure out the rest and include it in the plugin.

I’ll see if I can make a small test project with some “head” that “talks” using the plugin and WAV file.

That would be awesome!

I finally had a chance to open OVRLipSync project Oculus provides and it’s all there, including C++, phonemes, BPs, sounds. It would make no sense for me to make yet another test project. The only issue with 4.25 that it crashes when you open the project (do convert in place), but tells you to compile project in VS. After you compile it in VS, it opens and works in 4.25 just fine.

Note that UE 4.25.x is from Oculus repo.

Thanks man, I appreciate your help. I’ll check this out on friday! :cool: