DGOne - VoskSpeechRecognistion

_da1 · April 12, 2026, 2:44pm

VoskSpeechRecognistion — UE5 Offline Speech Recognition (Blueprint Ready)

Turn voice commands / NPC dialogue / hands-free interaction into a truly plug‑and‑play Blueprint capability inside Unreal.

No cloud services, no subscriptions — supports streaming recognition, real‑time partial results, and final result callbacks, ideal for games and enterprise‑grade offline scenarios. 🎮🎧

What You Get ✨

Blueprint-first speech recognition

Use UVoskSpeechSubsystem to drive the entire flow via nodes:

CreateSession — create a recognition session (returns SessionId)

StartSession / StopSession / DestroySession / ResetSession — lifecycle control

GetDefaultSessionId — query the current default session ID anytime (the session that receives the main Blueprint events)

ValidateSessionEnvironment / GetSessionState / GetSessionLastError\* / GetSessionStats — introspection and tuning

Event-driven recognition results (designed for Blueprint)

OnPartialResult — partial text, live as the user speaks — great for UI feedback or subtitles

OnFinalResult — final text, ideal for triggering commands / progressing dialogue

OnError — error code + message, for fast debugging

OnStateChanged — state machine events (Idle / Connecting / Running / …)

OnStatsUpdated — queue & throughput stats to help you tune performance and latency

Gameplay-ready text fields

NormalizedText / NormalizedPartialText — automatically strips extra whitespace and common punctuation noise

Built-in keyword layer (v1.1+) — not just raw strings:

Vosk Keyword Match Processor — OnKeywordHit, cooldown, partial dedupe, optional suppress-after-final

Vosk Keyword Library — EvaluateVoskKeywordHits for stateless checks

ProcessRecognitionResult — feed each OnPartialResult / OnFinalResult Result into the processor (wire the full struct; required for hits)

Same normalization rules for rules and recognition text — consistent command / keyword matching

Extensible architecture

Clear extension points at transport and protocol layers

Easy to integrate with different WebSocket backends / custom protocols 🔌

Why Us 💡

Offline, subscription-free, privacy-friendly

No paid cloud API required

Ideal for disconnected environments, on‑prem / intranet setups, privacy-sensitive projects, and cost-controlled deployments

Truly UE-oriented experience

Not just “it recognizes speech”: state events, error details, performance stats, and a first-class keyword pipeline for commands

Partial output is throttled and de-duplicated so Blueprints aren’t spammed — easier to debug and tune

Up and running in a few steps ✅

Run the local Language Server (or compatible WebSocket ASR) and point ServerUrl at it
In Blueprint: Get Subsystem (VoskSpeechSubsystem) → CreateSession → StartSession
Bind OnPartialResult / OnFinalResult
(Keywords) Create VoskKeywordMatchProcessor → Set Rules → Bind OnKeywordHit → on both partial and final events call ProcessRecognitionResult with the event Result
Bind OnFinalResult (and/or OnKeywordHit) to drive gameplay

Setup in Minutes ⚙️

You only need two external pieces:

Vosk Language Server — download and run (or any WebSocket server compatible with your workflow)

Recognition model — download a Vosk model and configure its path per server docs

Recommended resources

Vosk open-source project: https://github.com/alphacep/vosk-api

Official model downloads: https://alphacephei.com/vosk/models

Default plugin endpoint

ServerUrl default in FVoskRecognitionOptions is ws://127.0.0.1:8080 — change it to match your server (e.g. if you use another port such as 2700).

Typical Use Cases 🎯

Voice commands — “Open the door”, “switch weapon”, “start mission”, “pause game”…

NPC dialogue triggers — voice to drive dialogue trees, quests, story branches

Keyword / command gates — OnKeywordHit + rules for stable, tunable command detection on top of streaming text

Accessibility / hands-free control

Enterprise / exhibition / training — kiosks, exhibits, simulators, training terminals, intranet apps

Highlights 🚀

Streaming recognition — partials while speaking, finals at utterance end

Partial throttling & de-duplication — fewer noisy partial callbacks

Keyword processor — cooldown, per-rule partial dedupe, optional post-final partial suppress

Configurable queue overflow policy — DropOldest / DropNewest

Debug-friendly — error codes, states, stats, and ProcessRecognitionResult return value for quick sanity checks

Compatibility 🧩

Engine — Unreal Engine 5.x (plugin metadata targets 5.7; verify against your project)

Dependencies — AudioCapture, WebSockets

Platforms — primarily Windows (Win64) per current module allowlist; server/model/platform combo is your choice

Blueprint Workflow 🛠️

Typical flow:

Get Subsystem (VoskSpeechSubsystem)
CreateSession(FVoskRecognitionOptions) → keep SessionId (or GetDefaultSessionId for the default session)
StartSession(SessionId, …)
Bind OnFinalResult / OnPartialResult for subtitles or raw logic
Optional keywords — Construct Object → VoskKeywordMatchProcessor → Set Rules → Bind OnKeywordHit → from subsystem On Partial Result and On Final Result, call Process Recognition Result with the same processor and the event Result pin connected
Use NormalizedText / NormalizedPartialText (or OnKeywordHit) for command parsing