Unreal Engine + Pixel Streaming + AZSpeech / Runtime Speech Recognizer: Mic Input Woes!

GeneralAdolphus · December 29, 2023, 7:51am

Hey Devs! I’m building a project with voice control, using AZSpeech and Runtime Speech Recognizer Plugins for runtime speech recognition. Works like a charm in desktop builds, but I’m hitting a wall with Pixel Streaming.

Here’s the crux:

I want to use the browser mic instead of the local PC mic for voice input. Pixel Streaming sends that audio back just fine, but…

I can’t figure out how to feed that audio data directly into AZSpeech for speech-to-text magic. Right now, the received audio just plays through the speakers.

Tried searching forums, but no luck. Hoping someone here has tackled this issue before! Ideally, I’d love to:

Bypass the speaker playback and convert the received audio into a compatible format to send straight to AZSpeech.

Optimize the data flow for smooth, low-latency voice control.

Any tips, tricks, or even workarounds would be a lifesaver! Thanks in advance!

I’ve attached a screenshot of of my usual approach for Speech to Text using Computer’s Audio Input device.

I can Provide more information if needed, please help

Ben_Blau · January 15, 2025, 10:05pm

Hello! I’ve been using RSR for more than a year on various projects, and only recently did I start having problems. After starting from scratch in three different UE versions on two separate computers, I almost gave up. No matter what I did, I’d either get “You” or “[BLANK AUDIO]” print strings. Then it occurred to me that I recently installed NVIDIA Broadcast on all my computers, and gave it access to my microphones (built-in and external). So, I made a simple integer variable and plugged it into the Device Id pin of the Start Capture node. I compiled, and then changed the default value from 0 to 1. Bingo! Now that I’m aware of this, I think I’d better come up with a UI to allow players to select their mic input source. In case it matters, I also toggled VAD on just before Start Capture. I hope this helps anyone who might be experiencing similar frustration!

gtreshchev · February 27, 2025, 11:56am

Hi, you can recognize speech in a Pixel Streaming context by replacing the capturable sound wave with the synth-based sound wave, which is also used in the Runtime Audio Importer plugin. This sound wave is specifically designed for Pixel Streaming and allows you to capture audio on the client side and recognize speech on the server side.

For more information, please visit: Pixel streaming audio capture | Georgy Dev Docs