DGOne - VoskSpeechRecognistion

đź“•Doc

🎞Trailer

VoskSpeechRecognistion — UE5 Offline Speech Recognition (Blueprint Ready)

Turn voice commands / NPC dialogue / hands-free interaction into a truly plug‑and‑play Blueprint capability inside Unreal.

No cloud services, no subscriptions — supports streaming recognition, real‑time partial results, and final result callbacks, ideal for games and enterprise‑grade offline scenarios. 🎮🎧

What You Get ✨

Blueprint-first speech recognition

Use UVoskSpeechSubsystem to drive the entire flow via nodes:

  • CreateSession — create a recognition session (returns SessionId)

  • StartSession / StopSession / DestroySession / ResetSession — lifecycle control

  • GetDefaultSessionId — query the current default session ID anytime (the session that receives the main Blueprint events)

  • ValidateSessionEnvironment / GetSessionState / GetSessionLastError\* / GetSessionStats — introspection and tuning

Event-driven recognition results (designed for Blueprint)

  • OnPartialResult — partial text, live as the user speaks — great for UI feedback or subtitles

  • OnFinalResult — final text, ideal for triggering commands / progressing dialogue

  • OnError — error code + message, for fast debugging

  • OnStateChanged — state machine events (Idle / Connecting / Running / …)

  • OnStatsUpdated — queue & throughput stats to help you tune performance and latency

Gameplay-ready text fields

  • NormalizedText / NormalizedPartialText — automatically strips extra whitespace and common punctuation noise

  • Built-in keyword layer (v1.1+) — not just raw strings:

  • Vosk Keyword Match Processor — OnKeywordHit, cooldown, partial dedupe, optional suppress-after-final

  • Vosk Keyword Library — EvaluateVoskKeywordHits for stateless checks

  • ProcessRecognitionResult — feed each OnPartialResult / OnFinalResult Result into the processor (wire the full struct; required for hits)

  • Same normalization rules for rules and recognition text — consistent command / keyword matching

Extensible architecture

  • Clear extension points at transport and protocol layers

  • Easy to integrate with different WebSocket backends / custom protocols 🔌

Why Us đź’ˇ

Offline, subscription-free, privacy-friendly

  • No paid cloud API required

  • Ideal for disconnected environments, on‑prem / intranet setups, privacy-sensitive projects, and cost-controlled deployments

Truly UE-oriented experience

  • Not just “it recognizes speech”: state events, error details, performance stats, and a first-class keyword pipeline for commands

  • Partial output is throttled and de-duplicated so Blueprints aren’t spammed — easier to debug and tune

Up and running in a few steps âś…

  1. Run the local Language Server (or compatible WebSocket ASR) and point ServerUrl at it

  2. In Blueprint: Get Subsystem (VoskSpeechSubsystem) → CreateSession → StartSession

  3. Bind OnPartialResult / OnFinalResult

  4. (Keywords) Create VoskKeywordMatchProcessor → Set Rules → Bind OnKeywordHit → on both partial and final events call ProcessRecognitionResult with the event Result

  5. Bind OnFinalResult (and/or OnKeywordHit) to drive gameplay

Setup in Minutes ⚙️

You only need two external pieces:

  • Vosk Language Server — download and run (or any WebSocket server compatible with your workflow)

  • Recognition model — download a Vosk model and configure its path per server docs

Recommended resources

Default plugin endpoint

  • ServerUrl default in FVoskRecognitionOptions is ws://127.0.0.1:8080 — change it to match your server (e.g. if you use another port such as 2700).

Typical Use Cases 🎯

  • Voice commands — “Open the door”, “switch weapon”, “start mission”, “pause game”…

  • NPC dialogue triggers — voice to drive dialogue trees, quests, story branches

  • Keyword / command gates — OnKeywordHit + rules for stable, tunable command detection on top of streaming text

  • Accessibility / hands-free control

  • Enterprise / exhibition / training — kiosks, exhibits, simulators, training terminals, intranet apps

Highlights 🚀

  • Streaming recognition — partials while speaking, finals at utterance end

  • Partial throttling & de-duplication — fewer noisy partial callbacks

  • Keyword processor — cooldown, per-rule partial dedupe, optional post-final partial suppress

  • Configurable queue overflow policy — DropOldest / DropNewest

  • Debug-friendly — error codes, states, stats, and ProcessRecognitionResult return value for quick sanity checks

Compatibility đź§©

  • Engine — Unreal Engine 5.x (plugin metadata targets 5.7; verify against your project)

  • Dependencies — AudioCapture, WebSockets

  • Platforms — primarily Windows (Win64) per current module allowlist; server/model/platform combo is your choice

Blueprint Workflow 🛠️

Typical flow:

  1. Get Subsystem (VoskSpeechSubsystem)

  2. CreateSession(FVoskRecognitionOptions) → keep SessionId (or GetDefaultSessionId for the default session)

  3. StartSession(SessionId, …)

  4. Bind OnFinalResult / OnPartialResult for subtitles or raw logic

  5. Optional keywords — Construct Object → VoskKeywordMatchProcessor → Set Rules → Bind OnKeywordHit → from subsystem On Partial Result and On Final Result, call Process Recognition Result with the same processor and the event Result pin connected

  6. Use NormalizedText / NormalizedPartialText (or OnKeywordHit) for command parsing