Tiny Boulder Studios - Aboleth STT (Speech to Text)

Documentation: Full Documentation

Aboleth STT brings real-time, fully offline speech recognition to Unreal Engine. Powered by OpenAI's Whisper and GPU-accelerated through CUDA and Vulkan, it transcribes player speech locally with no cloud services, no API keys, and no per-minute billing. Everything runs on the player's hardware.

Unlike basic Whisper wrappers, Aboleth STT is built for real-time game audio. A neural voice activity detector (Silero VAD) listens continuously and triggers transcription only when speech is detected. Streaming mode shows text word-by-word as the player speaks, using a Local Agreement algorithm that confirms words across multiple inference passes before displaying them. The result is responsive, accurate transcription with no hallucinated output during silence.

FEATURES

  • GPU-Accelerated — CUDA (NVIDIA) and Vulkan (AMD, Intel, NVIDIA) backends. Automatically selects the best available backend at startup.

  • Silero Neural VAD — Streaming LSTM voice activity detection. No threshold hacks or RMS gating. Detects speech with high precision across noise conditions.

  • Streaming Transcription — See confirmed text appear word-by-word in real time.

  • Push-to-Talk — Switch between automatic VAD detection and manual capture at runtime. Zero-latency start since the mic buffer is always warm.

  • 99 Languages — Auto-detect language or force a specific one. Optional translation to English from any supported language.

  • Beam Search with Adaptive Gate — Higher accuracy final transcription. Automatically drops to greedy decoding if inference exceeds a time budget, then retries beam search on the next pass.

  • Runtime Tunable — Every setting (VAD threshold, language, streaming interval, beam search, mic gain, capture mode) is adjustable from C++ or Blueprint at runtime without reloading.

  • In-Editor Model Downloader — Download Whisper models directly from HuggingFace in Project Settings. No manual file management.

  • Three Integration Paths — Use the Listener Actor (drop-in), Listener Component (attach to any actor), or the STT Subsystem directly. All expose the same full API.

  • Pipeline State Machine — Clean state transitions (Idle, Accumulating, Processing) with full delegate coverage. No polling required.

  • Waveform Analyzer — Interactive browser-based tool for visualizing VAD probability logs. Inspect speech detection, committed words, and transcription timing.

    NOTE: VRAM usage approximately equals file size on disk. A mid-range GPU (RTX 3060+) handles the recommended Q5 model comfortably.

WHAT'S INCLUDED

- Full C++ source code

- Blueprint-exposed API with async nodes

- CUDA and Vulkan prebuilt backends

- Silero VAD

- In-editor model downloader

- Comprehensive documentation site