VoskSpeechRecognistion — UE5 Offline Speech Recognition (Blueprint Ready)
Turn voice commands / NPC dialogue / hands-free interaction into a truly plug‑and‑play Blueprint capability inside Unreal.
No cloud services, no subscriptions — supports streaming recognition, real‑time partial results, and final result callbacks, ideal for games and enterprise‑grade offline scenarios. 🎮🎧
What You Get ✨
Blueprint-first speech recognition
Use UVoskSpeechSubsystem to drive the entire flow via nodes:
CreateSession — create a recognition session (returns SessionId)
StartSession / StopSession / DestroySession / ResetSession — lifecycle control
GetDefaultSessionId — query the current default session ID anytime (the session that receives the main Blueprint events)
ValidateSessionEnvironment / GetSessionState / GetSessionLastError\* / GetSessionStats — introspection and tuning
Event-driven recognition results (designed for Blueprint)
OnPartialResult — partial text, live as the user speaks — great for UI feedback or subtitles
OnFinalResult — final text, ideal for triggering commands / progressing dialogue
OnError — error code + message, for fast debugging
OnStateChanged — state machine events (Idle / Connecting / Running / …)
OnStatsUpdated — queue & throughput stats to help you tune performance and latency
Gameplay-ready text fields
NormalizedText / NormalizedPartialText — automatically strips extra whitespace and common punctuation noise
Built-in keyword layer (v1.1+) — not just raw strings:
Vosk Keyword Match Processor — OnKeywordHit, cooldown, partial dedupe, optional suppress-after-final
Vosk Keyword Library — EvaluateVoskKeywordHits for stateless checks
ProcessRecognitionResult — feed each OnPartialResult / OnFinalResult Result into the processor (wire the full struct; required for hits)
Same normalization rules for rules and recognition text — consistent command / keyword matching
Extensible architecture
Clear extension points at transport and protocol layers
Easy to integrate with different WebSocket backends / custom protocols 🔌
Why Us đź’ˇ
Offline, subscription-free, privacy-friendly
No paid cloud API required
Ideal for disconnected environments, on‑prem / intranet setups, privacy-sensitive projects, and cost-controlled deployments
Truly UE-oriented experience
Not just “it recognizes speech”: state events, error details, performance stats, and a first-class keyword pipeline for commands
Partial output is throttled and de-duplicated so Blueprints aren’t spammed — easier to debug and tune
Up and running in a few steps âś…
Run the local Language Server (or compatible WebSocket ASR) and point ServerUrl at it
In Blueprint: Get Subsystem (VoskSpeechSubsystem) → CreateSession → StartSession
Bind OnPartialResult / OnFinalResult
(Keywords) Create VoskKeywordMatchProcessor → Set Rules → Bind OnKeywordHit → on both partial and final events call ProcessRecognitionResult with the event Result
Bind OnFinalResult (and/or OnKeywordHit) to drive gameplay
Setup in Minutes ⚙️
You only need two external pieces:
Vosk Language Server — download and run (or any WebSocket server compatible with your workflow)
Recognition model — download a Vosk model and configure its path per server docs
Recommended resources
Vosk open-source project: https://github.com/alphacep/vosk-api
Official model downloads: https://alphacephei.com/vosk/models
Default plugin endpoint
ServerUrl default in FVoskRecognitionOptions is ws://127.0.0.1:8080 — change it to match your server (e.g. if you use another port such as 2700).
Typical Use Cases 🎯
Voice commands — “Open the door”, “switch weapon”, “start mission”, “pause game”…
NPC dialogue triggers — voice to drive dialogue trees, quests, story branches
Keyword / command gates — OnKeywordHit + rules for stable, tunable command detection on top of streaming text
Accessibility / hands-free control
Enterprise / exhibition / training — kiosks, exhibits, simulators, training terminals, intranet apps
Highlights 🚀
Streaming recognition — partials while speaking, finals at utterance end
Partial throttling & de-duplication — fewer noisy partial callbacks
Keyword processor — cooldown, per-rule partial dedupe, optional post-final partial suppress
Configurable queue overflow policy — DropOldest / DropNewest
Debug-friendly — error codes, states, stats, and ProcessRecognitionResult return value for quick sanity checks
Compatibility đź§©
Engine — Unreal Engine 5.x (plugin metadata targets 5.7; verify against your project)
Dependencies — AudioCapture, WebSockets
Platforms — primarily Windows (Win64) per current module allowlist; server/model/platform combo is your choice
Blueprint Workflow 🛠️
Typical flow:
Get Subsystem (VoskSpeechSubsystem)
CreateSession(FVoskRecognitionOptions) → keep SessionId (or GetDefaultSessionId for the default session)
StartSession(SessionId, …)
Bind OnFinalResult / OnPartialResult for subtitles or raw logic
Optional keywords — Construct Object → VoskKeywordMatchProcessor → Set Rules → Bind OnKeywordHit → from subsystem On Partial Result and On Final Result, call Process Recognition Result with the same processor and the event Result pin connected
Use NormalizedText / NormalizedPartialText (or OnKeywordHit) for command parsing