Created this plugin for a personal project and I’ve decided to share with the community for free. : )
A plugin integrating Azure Speech Cognitive Services to Unreal Engine with simple functions which can do these asynchronous tasks:
- Voice-To-Text (Convert a speech into a string)
- Text-To-Voice (Convert a string into a speech)
- Text-To-Wav (Convert a string into a .wav audio file)
- Wav-To-Text (Convert a .wav audio file into a string)
- Text-To-Stream (Convert a string into a audio data stream)
And helper functions:
- Runtime USoundWave importer via Audio File
- Runtime USoundWave importer via Audio Data Stream
Marketplace: UE Marketplace
Microsoft Documentation: Speech Service - Microsoft Documentations
Support me: Sponsor @lucoiso on GitHub Sponsors
Any tutorial or even a detailed document on Speech to Text using this plugin is helpful and appreciated! Thanks for the contribution!
Thanks for the feedback, I’ll get to work on it! : )
Added a new function: Text to WAV!
This new function is capable to save a text into a .wav audio file inside the specified location/path.
Look at the demo video:
Already on UE Marketplace! : )
Is there a possibility to play the audio async inside the game?
Also can we get sound wav file out from the node?
Hey! : )
What’s going on?
This function automatically uses the default microphone from Windows settings.
Are you using the function via blueprint or C++?
And check your API Access Key from your Azure portal:
Hey! : )
Not with this plugin, but Unreal Engine provides some classes that allow you to get a file from your computer.
Only in Editor
I created an example via C++ where I used a Sound Factory to create a USoundWave at runtime:
USoundFactory* NewFactory = NewObject<USoundFactory>(this, TEXT("Sound Factory"));
UObject* MyRuntimeObject = NewFactory->FactoryCreateFile(USoundWave::StaticClass(), this, "RuntimeAudio",
EObjectFlags::RF_NoFlags, TEXT("D:\\PC\\Music\\Captured Audios\\Test.wav"),
TEXT(""), nullptr, bOperationCanceled);
USoundWave* MyRuntimeAudio = Cast<USoundWave>(MyRuntimeObject);
Obs.: AudioCompEx is a UAudioComponent;
Obs. 2: Add “AudioEditor” module in your Build.cs to enable USoundFactory class
With this, you can load a file and play in a determined location on level, apply attenuation, etc, via your Audio Component/Sound Wave Component
The “AudioEditor” module only works in the Editor. : (
There is a plugin that does this job: gtreshchev/RuntimeAudioImporter: Runtime Audio Importer plugin for Unreal Engine. Importing audio of various formats at runtime. (github.com)
I’m sorry… that was my fault, i did test voice to text
‘LanguageID = ko-KR’ setting, it’s dosen’t work, but ‘LanguegeId = en-US’ setting, it’s work very well.
Working with official UE5 released version!
During Speech to Text process, I would like to set a Push to talk button and recieve a recognized string only after releasing the key, granting that my full phrase was parsed, not after a period of time without audio. Is there a way to do this?
Also, with my language (pt-BR), the recognized string returns a character encoding that is nos correct. The recognized string returns “Ã©” instead of “é”, “Ã¡” instead of “á”, etc. Is there a way to change this?
Released a fix to this:
Release AzSpeech v2.1.1 · lucoiso/UEAzSpeech (github.com)
This update will be on the Marketplace soon
- Changed the string conversion from wchar_t* to UTF8_TO_TCHAR(STRING.c_str())
Great plugin. Would be nice to have SSML supported and I am trying to figure out how to abort some sound. Can’t figure out the location of the ‘default audio component’ so far. Is it at the Game Instance? Or at the Level Blueprint? Or at the character that speaks audio?
This plugins uses the default output/input audio devices from Windows (OS) Settings, at this moment is not possible to change this with this plugin.
But with the Text-To-WAV file, you can save a speech into a .WAV file and stream it to a USoundWave to use inside a Audio Component. : )
Maybe this plugin could help you to stream a .WAV into a UAudioComponent/USoundWave: gtreshchev/RuntimeAudioImporter: Runtime Audio Importer plugin for Unreal Engine. Importing audio of various formats at runtime. (github.com) 4
This plugins uses the default output/input audio devices from Windows (OS) Settings
That’s what I was afraid of indeed.
Maybe this plugin could help you to stream a .WAV into a UAudioComponent/USoundWave
Aah, that’s great indeed. I was already trying to figure out if there was a way to do that. Thanks for the link!
Some upcoming features:
I’ve committed this update to a separate branch (upcoming-test) and will release it when I think it’s fully functional and bug free.
But if you want to test these features, feel free to use it and give me feedback! : )
Please note that these features are in testing, some things may change until the release date.
Does this technology work with voice commands? I want to use it like Siri: “Hey Unreal, set the weather to thunder and rain.”
Doesn’t support voice commands by itself but there’s some functions that convert your speech into a string (VoiceToTextAsync) and convert a string into speech (TextToVoiceAsync and the upcoming TextToStreamAsync) that you can use to to construct this logic.
For example: the VoiceToTextAsync returns a string of your recognized speech that you can use inside a comparing logic, etc.
A short example:
Voice to Text blueprint:
When the key M is pressed, Azure will listen to your default microphone (from Windows OS settings) and print the recognized speech as a string in your screen.
Getting APIAccess Key and Region: lucoiso/UEAzSpeech (github.com)
More informations: UEAzSpeech/README.md at main · lucoiso/UEAzSpeech (github.com)