Unreal Engine + Pixel Streaming + AZSpeech / Runtime Speech Recognizer: Mic Input Woes!

Hey Devs! I’m building a project with voice control, using AZSpeech and Runtime Speech Recognizer Plugins for runtime speech recognition. Works like a charm in desktop builds, but I’m hitting a wall with Pixel Streaming.

Here’s the crux:

I want to use the browser mic instead of the local PC mic for voice input. Pixel Streaming sends that audio back just fine, but…

I can’t figure out how to feed that audio data directly into AZSpeech for speech-to-text magic. Right now, the received audio just plays through the speakers.

Tried searching forums, but no luck. Hoping someone here has tackled this issue before! Ideally, I’d love to:

Bypass the speaker playback and convert the received audio into a compatible format to send straight to AZSpeech.

Optimize the data flow for smooth, low-latency voice control.

Any tips, tricks, or even workarounds would be a lifesaver! Thanks in advance!

I’ve attached a screenshot of of my usual approach for Speech to Text using Computer’s Audio Input device.

I can Provide more information if needed, please help

1 Like

Hello! I’ve been using RSR for more than a year on various projects, and only recently did I start having problems. After starting from scratch in three different UE versions on two separate computers, I almost gave up. No matter what I did, I’d either get “You” or “[BLANK AUDIO]” print strings. Then it occurred to me that I recently installed NVIDIA Broadcast on all my computers, and gave it access to my microphones (built-in and external). So, I made a simple integer variable and plugged it into the Device Id pin of the Start Capture node. I compiled, and then changed the default value from 0 to 1. Bingo! Now that I’m aware of this, I think I’d better come up with a UI to allow players to select their mic input source. In case it matters, I also toggled VAD on just before Start Capture. I hope this helps anyone who might be experiencing similar frustration!