You can’t get too much speed with complex personality prompts on local libraries that run on end-user hardwares. You guys get used to Gemini, ChatGPT and Claude. They use billion dollars worth hardwares.
Of course there are ways to make local LLMs faster. but there is nothing to do on Unreal’s side. Cactus team can do it if they see fit with their scope. For example, they can increase usages of hardware specific instruction sets or some advanced hardware accelerated calculation systems.
But these are not Cactus Framework’s job. This is a simple, easy-to-use and implement library solution especially for edge devices when privacy required or there is no network connection.
More dependency means more weight and complexity.
Also, UE5 is very problematic about third party libraries. Because sometimes already included third party libraries cause clashes and we can’t update them without engine modifications. So, you wouldn’t be able to use it.
And I will be honest with you. If I needed to integrate that much libraries to UE5 such as CuBlast and solve all the problems, I wouldn’t share it as a free & open source project. Because my goodness has its limits
You can increase the thread counts, though but you have to be careful with it. Because Unreal already use game thread, render thread, network thread (if your project is multiplayer), audio thread and maybe you can use some allocated threads (such as FRunnables) in the future. So, if you have 8 core 16 thread computer, don’t give more than half.
Also, you can use more advanced models but they need more RAM.
If your game requires clever, philosophical conversations, local LLMs might not be good for you (or at least not with every hardwares). Design your project in a way that weirdness of AI halucinations are good for you. Consume and use that… madness.