Muddy Terrain Games - GenAI Llama (Local LLM Model in Game, Runtime, GPT-OSS, Llama.cpp, Hugging Face)

Run powerful open-source language models entirely on the player's machine — no cloud API keys, no internet required, no per-token costs. GenAI Llama brings local AI inference to Unreal Engine 5.1 – 5.7 with a single, consistent Blueprint and C++ API!

Product Website | Documentation | Example Project | Support/Discord

Two Ways to Run:

  1. Embedded Inference (llama.cpp): Run GGUF models directly inside your game with no server, no extra process, fully offline. Pinned to llama.cpp b8802 (April 2026). Headers ship with the plugin; drop in the prebuilt binaries (or compile from source) and rebuild.

  2. HTTP Providers: Connect to any local inference server. One plugin, seven providers, one dropdown to switch. Providers like Ollama (Native), LM Studio (OpenAI Compatible), vLLM (OpenAI Compatible) etc.

Features

  • Async Blueprint nodes for everything: Chat completion, streaming chat completion, model list, health check, embedded model load. Every async node returns a handle with Cancel() so it's safe to tear down mid-generation when the player closes a UI.

  • Streaming: Token-by-token deltas via Blueprint delegate, ideal for typewriter UIs.

  • Multimodal vision (HTTP): Pass UTexture2D references on a chat message; the plugin auto-encodes to PNG Base64. Works with llava, llama3.2-vision, and other vision models. Per-provider format handling (Ollama images array vs. OpenAI image_url parts) is automatic.

  • In-editor status panel: Project Settings → Plugins → GenAI Llama shows whether embedded inference is loaded and scans every platform folder for installed libraries. One-click button to open the pinned llama.cpp release page.

  • Full C++ API: Production-ready snippets in the docs for chat, streaming, embedded loading.

  • Example Project: Blueprint examples covering different features, available per engine version on the documentation site

Compatibility

  • Platforms at launch: Windows (x64), macOS (Apple Silicon and Intel), Linux (x64) — both HTTP and Embedded modes

  • Embedded backends: CUDA / Vulkan / CPU on Windows & Linux, Metal / CPU on macOS

  • Unreal Engine: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7

How to setup

Check the plugin documentation per platform instructions.