Muddy Terrain Games - GenAI Llama (Local LLM Model in Game, Runtime, GPT-OSS, Llama.cpp, Hugging Face)

Prajwal_Muddy · April 28, 2026, 9:16am

Run powerful open-source language models entirely on the player's machine — no cloud API keys, no internet required, no per-token costs. GenAI Llama brings local AI inference to Unreal Engine 5.1 – 5.7 with a single, consistent Blueprint and C++ API!

Product Website | Documentation | Example Project | Support/Discord

Two Ways to Run:

Embedded Inference (llama.cpp): Run GGUF models directly inside your game with no server, no extra process, fully offline. Pinned to llama.cpp b8802 (April 2026). Headers ship with the plugin; drop in the prebuilt binaries (or compile from source) and rebuild.
HTTP Providers: Connect to any local inference server. One plugin, seven providers, one dropdown to switch. Providers like Ollama (Native), LM Studio (OpenAI Compatible), vLLM (OpenAI Compatible) etc.

Features

Async Blueprint nodes for everything: Chat completion, streaming chat completion, model list, health check, embedded model load. Every async node returns a handle with Cancel() so it's safe to tear down mid-generation when the player closes a UI.
Streaming: Token-by-token deltas via Blueprint delegate, ideal for typewriter UIs.
Multimodal vision (HTTP): Pass UTexture2D references on a chat message; the plugin auto-encodes to PNG Base64. Works with llava, llama3.2-vision, and other vision models. Per-provider format handling (Ollama images array vs. OpenAI image_url parts) is automatic.
In-editor status panel: Project Settings → Plugins → GenAI Llama shows whether embedded inference is loaded and scans every platform folder for installed libraries. One-click button to open the pinned llama.cpp release page.
Full C++ API: Production-ready snippets in the docs for chat, streaming, embedded loading.
Example Project: Blueprint examples covering different features, available per engine version on the documentation site

Compatibility

Platforms at launch: Windows (x64), macOS (Apple Silicon and Intel), Linux (x64) — both HTTP and Embedded modes
Embedded backends: CUDA / Vulkan / CPU on Windows & Linux, Metal / CPU on macOS
Unreal Engine: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7

How to setup

Check the plugin documentation per platform instructions.