Runtime Text To Speech - offline, cross-platform TTS, over 35 languages and 900 voices

gtreshchev · January 17, 2025, 10:43pm

Transform your game with real-time, offline, cross-platform text-to-speech synthesis!

Add powerful offline text-to-speech capabilities to your project with over 35 languages and 900 voices featuring more than 120 voice qualities. Synthesize speech in real-time without internet connectivity, powered by Piper and ONNX Runtime.

Quick links:

Key features:

Core Capabilities:

Complete offline text-to-speech synthesis
35+ languages supported
900+ unique voices available
120+ voice qualities
Cross-platform support: Windows, Linux, Mac, Android (including Oculus/Meta Quest), iOS
Experimental support for Meta Quest and Apple Vision Pro

Voice System:

One-click voice model downloads through editor interface
In-editor voice preview and testing
Runtime voice model selection
Raw PCM float audio output
Flexible integration with any audio playback solution
Built-in compatibility with Runtime Audio Importer

Development Features:

Full Blueprint and C++ API support
Easy voice model management and packaging
Comprehensive voice metadata access
Simple voice model selection via dropdown
Automated voice model packaging with projects

Supported Languages:

English (United States) – 18 voice models, 23 qualities
English (British) – 9 voice models, 11 qualities
Simplified Chinese (简体中文) – 1 voice model, 2 qualities
Spanish (Mexican / Español Mexicano) – 2 voice models, 2 qualities
Spanish (European / Español Europeo) – 5 voice models, 5 qualities
Russian (Русский) – 4 voice models, 4 qualities
Portuguese (Brazil / Português do Brasil) – 2 voice models, 2 qualities
Portuguese (Portugal / Português de Portugal) – 1 voice model, 1 quality
German (Deutsch) – 8 voice models, 10 qualities
French (Français) – 6 voice models, 7 qualities
Turkish (Türkçe) – 3 voice models, 3 qualities
Polish (Polski) – 4 voice models, 4 qualities
Italian (Italiano) – 2 voice models, 2 qualities
Ukrainian (Украї́нська мо́ва) – 2 voice models, 2 qualities
Catalan (Català) – 2 voice models, 3 qualities
Czech (Čeština) – 1 voice model, 2 qualities
Welsh (Cymraeg) – 1 voice model, 1 quality
Danish (Dansk) – 1 voice model, 1 quality
Greek (Ελληνικά) – 1 voice model, 1 quality
Farsi (فارسی) – 2 voice models, 2 qualities
Finnish (Suomi) – 1 voice model, 1 quality
Hungarian (Magyar) – 3 voice models, 3 qualities
Icelandic (Íslenska) – 4 voice models, 4 qualities
Georgian (ქართული ენა) – 1 voice model, 1 quality
Kazakh (Қазақша) – 3 voice models, 3 qualities
Luxembourgish (Lëtzebuergesch) – 1 voice model, 1 quality
Latvian (Latviešu) – 1 voice model, 1 quality
Nepali (नेपाली) – 1 voice model, 2 qualities
Dutch (Belgium / Vlaams) – 2 voice models, 4 qualities
Dutch (Netherlands / Nederlands) – 3 voice models, 3 qualities
Norwegian (Bokmål / Nynorsk) – 1 voice model, 1 quality
Romanian (Română) – 1 voice model, 1 quality
Slovak (Slovenčina) – 1 voice model, 1 quality
Slovenian (Slovenščina) – 1 voice model, 1 quality
Serbian (Srpski) – 1 voice model, 1 quality
Swedish (Svenska) – 1 voice model, 1 quality
Swahili (Kiswahili) – 1 voice model, 1 quality
Vietnamese (Tiếng Việt) – 3 voice models, 3 qualities

Perfect for:

Accessible game interfaces
Dynamic NPC conversations
Voice-driven tutorials and hints
Procedurally generated content
Localization solutions
Assistive technologies
Interactive storytelling
Educational applications