speech recognition options for VR (Quest)

Hi. I’m building a small app for the Quest.
I’m looking for speech recognition options for the app.
Basically, to begin with I just want the user of the app to say “yes” or “no” and that gets
converted to text and then I can use some code to decide what to do based on whether the answer is yes or no.

I’ve gone through the process of setting up the Voice SDK that is based on wit.ai. The problem I have with this type of NLP is that it seems to be built on the idea that the user will pose a question, eg. “What is the time?”,. “What is the weather?” and the NLP tries to make sense of the question and then provide a response. But I kind of want to do the reverse - where I’m asking the question and getting a response from the user.

There are a few plugins on the marketplace but they aren’t free and not sure about the quality. Was wondering if Unreal had a native Speech to Text module or if anyone had other suggestions.

Thanks

1 Like

You can take a look at the RuntimeSpeechRecognizer plugin, which is based on Whisper AI, namely whisper.cpp. It’s cross-platform and works offline, as well as completely free and open source.

GitHub link: GitHub - gtreshchev/RuntimeSpeechRecognizer: Cross-platform, real-time, offline speech recognition plugin for Unreal Engine. Based on Whisper OpenAI technology, whisper.cpp.

Marketplace link (But please prefer downloading from GitHub as we’ve noticed some issues with the Marketplace language model staging): Runtime Speech Recognizer in Code Plugins - UE Marketplace

1 Like

Thanks @gtreshchev. I’m using UE 5.2 for VR with a Quest 2 and setup the blueprint (BP_Speech) in exactly the same way as you set the blueprint up in this image. I directly downloaded the plugin from github and unzipped it in the plugins folder.

I got a couple of errors - I think they are probably related to each other.

The first error is that the language model failed to load - but it definitely was downloaded when downloading from the Project Settings menu.

And the second error that comes up is that " The audio data could not be processed to the recognizer since the thread is stopped". But I suspect that is probably happening because the language model couldn’t be loaded.

LogRuntimeSpeechRecognizer: Error: Language model loading failed: Failed to load the language model asset '/RuntimeSpeechRecognizer/LanguageModel.LanguageModel'

LogRuntimeSpeechRecognizer: Error: Audio processing failed: The audio data could not be processed to the recognizer since the thread is stopped

The language model is located in:

It looks like it is included in the packages for release, but I’m wondering if it’s not included in the packages when using the Quest for debugging?

Could do with some advice about the model and why it seems to have downloaded correctly but Unreal can’t find its location. (I think when it is played in the viewport the language model is recognised, just not when deploying on the Quest).

I had another couple of questions - in the image I didn’t quite understand the need for the “Create Capturable Sound” node that comes after “Start Speech Recogniser”. Is the “Start Speech Recogniser” node, not enough by itself for the Quest 2 to convert speech to text? Does it firtst need to be processed?

Thanks for the help.

I might have solved the issue by selecting Platforms → Project Launcher and making sure the settings are “Development” and “By the book”. Doing it by the book seemed to include all the packages. It took 2.5 hours to package the first time, but then it’s cached so on subsequent launches it takes about 2-3 minutes.

The “Can’t find language model” error disappears - the issue I have now is that after launch, even when I speak, within about 10 seconds the finished event is triggered (which prints “finished” to the screen) and it doesn’t seem that my speech is recognised.

For anyone stumbling across this in the future, meta has a Voice SDK for the quest platform and has provided it as a plugin on the marketplace (here’s their guide to setting it up: https://developers.meta.com/horizon/documentation/unreal/vsdk-integrate-voice)

OP is referring to its “Voice Commands” modality (NLU), but you can also just leverage the NLP for dictation: https://developers.meta.com/horizon/documentation/unreal/vsdk-dictation

Good luck, and happy developing!