Tiny Boulder Studios - Aboleth STT (Speech to Text)

TinyBldrStudios · February 28, 2026, 7:52pm

Documentation: Full Documentation

Aboleth STT brings real-time, fully offline speech recognition to Unreal Engine. Powered by OpenAI's Whisper and GPU-accelerated through CUDA and Vulkan, it transcribes player speech locally with no cloud services, no API keys, and no per-minute billing. Everything runs on the player's hardware.

Unlike basic Whisper wrappers, Aboleth STT is built for real-time game audio. A neural voice activity detector (Silero VAD) listens continuously and triggers transcription only when speech is detected. Streaming mode shows text word-by-word as the player speaks, using a Local Agreement algorithm that confirms words across multiple inference passes before displaying them. The result is responsive, accurate transcription with no hallucinated output during silence.

FEATURES

GPU-Accelerated — CUDA (NVIDIA) and Vulkan (AMD, Intel, NVIDIA) backends. Automatically selects the best available backend at startup.
Silero Neural VAD — Streaming LSTM voice activity detection. No threshold hacks or RMS gating. Detects speech with high precision across noise conditions.
Streaming Transcription — See confirmed text appear word-by-word in real time.
Push-to-Talk — Switch between automatic VAD detection and manual capture at runtime. Zero-latency start since the mic buffer is always warm.
99 Languages — Auto-detect language or force a specific one. Optional translation to English from any supported language.
Beam Search with Adaptive Gate — Higher accuracy final transcription. Automatically drops to greedy decoding if inference exceeds a time budget, then retries beam search on the next pass.
Runtime Tunable — Every setting (VAD threshold, language, streaming interval, beam search, mic gain, capture mode) is adjustable from C++ or Blueprint at runtime without reloading.
In-Editor Model Downloader — Download Whisper models directly from HuggingFace in Project Settings. No manual file management.
Three Integration Paths — Use the Listener Actor (drop-in), Listener Component (attach to any actor), or the STT Subsystem directly. All expose the same full API.
Pipeline State Machine — Clean state transitions (Idle, Accumulating, Processing) with full delegate coverage. No polling required.
Waveform Analyzer — Interactive browser-based tool for visualizing VAD probability logs. Inspect speech detection, committed words, and transcription timing.

NOTE: VRAM usage approximately equals file size on disk. A mid-range GPU (RTX 3060+) handles the recommended Q5 model comfortably.

WHAT'S INCLUDED

- Full C++ source code

- Blueprint-exposed API with async nodes

- CUDA and Vulkan prebuilt backends

- Silero VAD

- In-editor model downloader

- Comprehensive documentation site