Learn how to run large language models entirely on-device in Unreal Engine with the Runtime Local LLM plugin. This tutorial walks you through downloading and managing GGUF models in the editor, loading models at runtime in Blueprints or C++, streaming token-by-token responses, and configuring inference parameters like temperature, context size, and GPU layer offloading. Powered by llama.cpp, the plugin supports offline inference on Windows, Mac, Linux, Android, iOS, and Meta Quest, with full Blueprint and C++ APIs for chat systems, NPC dialogue, dynamic content generation, and more. No cloud services or API keys required at runtime, everything runs locally on the player’s device.
https://dev.epicgames.com/community/learning/tutorials/M45X/fab-running-local-llms-offline-in-unreal-engine-runtime-local-llm-plugin-tutorial