Runtime Local LLM (llama.cpp)

:brain: Run large language models entirely on-device in Unreal Engine - offline, cross-platform, powered by llama.cpp

Run GGUF-format LLMs (Llama, Mistral, Phi, Gemma, Qwen, TinyLlama, and more) directly within your Unreal Engine project with no internet connection, no API keys, and no cloud dependencies at runtime. The plugin wraps llama.cpp with a full Blueprint and C++ API, load models, send messages, and receive token-by-token streamed responses, all on a background thread with game-thread callbacks.

Quick links:

Key features:

:bullseye: Core Capabilities:

  • Complete offline inference: no cloud services or subscriptions required
  • GGUF model support: load any GGUF-format model (Llama, Mistral, Phi, Gemma, Qwen, TinyLlama, etc.)
  • Up-to-date llama.cpp: updated regularly on Fab to keep pace with llama.cpp releases, so the latest GGUF model formats are always supported
  • GPU acceleration: Vulkan on Windows and Linux, Metal on Mac and iOS, optimized CPU + intrinsics on Android and Meta Quest
  • Cross-platform: Windows, Mac, Linux, Android (including Meta Quest), iOS

:high_voltage: Model Loading & Management:

:speaking_head: Inference & Conversation:

:hammer_and_wrench: Development Features:

  • Full Blueprint and C++ API with async nodes and delegate-based callbacks
  • Model library functions for querying available models, checking disk presence, retrieving metadata
  • Automatic packaging: models ship with your project via NonUFS staging with no manual configuration
  • Comprehensive error handling with descriptive error codes

:video_game: Perfect for:

  • NPC dialogue and dynamic conversations
  • In-game AI assistants and companions
  • Procedural content generation (quests, lore, item descriptions)
  • Voice-driven gameplay workflows (paired with Runtime Speech Recognizer and Runtime Text To Speech)
  • Offline chatbot interfaces
  • Educational and training applications
  • Privacy-sensitive deployments with no data leaving the device

:glowing_star: Compatible plugins: