Community Tutorial: Runtime Local LLM: Offline On-Device AI with Streaming Responses in Unreal Engine

A video tutorial on setting up and using a local large language model (LLM) entirely on-device in Unreal Engine using the Runtime Local LLM plugin. The plugin runs LLMs offline with no internet connection, no API keys, and no cloud services required at runtime, powered by llama.cpp under the hood. You’ll learn how to import or download GGUF models from the editor settings, test them using the built-in LLM Interface Test menu, set up a simple chat workflow in Blueprints with token-by-token streaming, and download, load, and chat with a model in a single runtime flow without bundling the model file in your build. Compatible with UE 4.27 through 5.7, and supports Windows, Mac, Linux, Android (including Meta Quest and other Android-based platforms), and iOS. Works with a wide range of model families including Llama, Mistral, Phi, Gemma, Qwen, TinyLlama, and many more GGUF models.

https://dev.epicgames.com/community/learning/tutorials/PeKV/fab-runtime-local-llm-offline-on-device-ai-with-streaming-responses-in-unreal-engine