Hey Devs! I’m building a project with voice control, using AZSpeech and Runtime Speech Recognizer Plugins for runtime speech recognition. Works like a charm in desktop builds, but I’m hitting a wall with Pixel Streaming.
Here’s the crux:
I want to use the browser mic instead of the local PC mic for voice input. Pixel Streaming sends that audio back just fine, but…
I can’t figure out how to feed that audio data directly into AZSpeech for speech-to-text magic. Right now, the received audio just plays through the speakers.
Tried searching forums, but no luck. Hoping someone here has tackled this issue before! Ideally, I’d love to:
Bypass the speaker playback and convert the received audio into a compatible format to send straight to AZSpeech.
Optimize the data flow for smooth, low-latency voice control.
Any tips, tricks, or even workarounds would be a lifesaver! Thanks in advance!
I’ve attached a screenshot of of my usual approach for Speech to Text using Computer’s Audio Input device.
I can Provide more information if needed, please help