Course: Neural Network Engine (NNE)

user_9618397a5d25a4979c35d35e03cad1f6cec5b3c61f5358356d0910 · October 8, 2024, 2:35pm

Hello, I’m new to UE and currently using version 5.4.4.

After reviewing much of the discussions here, I have a few questions:

I noticed there are tutorials for Neural Network Inference (NNE) in version 5.3, but I haven’t been able to find any for version 5.4. Have these not been released yet?
Does UE’s NNE offer any tools for handling large language models (LLMs), such as a “tokenizer” for converting text to token IDs?
Specifically, LLMs go through an auto-regressive process, where the model generates tokens iteratively until an end-of-sequence token is produced, feeding previously generated tokens back into the model. Is this process supported?

The reason I ask is that when testing an ONNX-converted LLM model in UE 5.3,
I could only identify the following input and output Tensor Shapes

Input Shape: input-ids, attention-mask, position-ids
Output Shape:logits

However, I wasn’t able to observe the entire auto-regressive generation process that results in a final long sequence text output.

If support for the auto-regressive generation process and tokenizers is not yet available,
would I need to implement these features myself at the C++ level?

Cyberqat · October 8, 2024, 10:26pm

Thanks. I am fairly far along but i am getting a windows exception deep in the optimization pass on my model. Is there an ONNX model somewhere I cna use to test that is known to work with NNE in 5.4?

This is the exception:

Exception 0xe06d7363 encountered at address 0x7fff8d46fabc
<unknown> 0x00007fff8d46fabc
UE::NNEUtilities::Internal::FOnnxRuntimeModelOptimizerPass::ApplyPass(FNNEModelRaw &, const UE::NNE::FAttributeMap &) NNEUtilitiesModelOptimizerONNX.cpp:67
UE::NNEUtilities::Internal::FModelOptimizerBase::ApplyAllPassesAndValidations(FNNEModelRaw &, const UE::NNE::FAttributeMap &) NNEUtilitiesModelOptimizerBase.cpp:87
UNNERuntimeORTCpu::CreateModelData(const FString &, TArrayView<…>, const TMap<…> &, const FGuid &, const ITargetPlatform *) NNERuntimeORT.cpp:82
UE::NNE::ModelData::CreateModelData(const FString &, const FString &, const TArray<…> &, const TMap<…> &, const FGuid &, const ITargetPlatform *) NNEModelData.cpp:142
UNNEModelData::GetModelData(const FString &) NNEModelData.cpp:582
UNNERuntimeORTCpu::CanCreateModelCPU(TObjectPtr<…>) NNERuntimeORT.cpp:134
UNNERuntimeORTCpu::CreateModelCPU(TObjectPtr<…>) NNERuntimeORT.cpp:159
UNNEBlueprintInterfaceBPLibrary::CreateModelInstance(FNNDataModel, FNNModelInstance &, bool &) NNEBlueprintInterfaceBPLibrary.cpp:75
UObject::execCallMathFunction(UObject *, FFrame &, void *const) ScriptCore.cpp:1075
[Inlined] FFrame::Step(UObject *, void *const) ScriptCore.cpp:478
[Blueprint] test: Inputs  Event Graph
ProcessLocalScriptFunction(UObject *, FFrame &, void *const) ScriptCore.cpp:1233
ProcessScriptFunction<…>(UObject *, UFunction *, FFrame &, void *const, void (*)(UObject *, FFrame &, void *)) ScriptCore.cpp:1036
ProcessLocalFunction(UObject *, UFunction *, FFrame &, void *const) ScriptCore.cpp:1276
[Inlined] FFrame::Step(UObject *, void *const) ScriptCore.cpp:478
[Blueprint] test: Event BeginPlay  Function /Game/test.test_C:ReceiveBeginPlay
ProcessLocalScriptFunction(UObject *, FFrame &, void *const) ScriptCore.cpp:1233
UObject::ProcessInternal(UObject *, FFrame &, void *const) ScriptCore.cpp:1303
UFunction::Invoke(UObject *, FFrame &, void *const) Class.cpp:6847
UObject::ProcessEvent(UFunction *, void *) ScriptCore.cpp:2142
AActor::ProcessEvent(UFunction *, void *) Actor.cpp:1092
AActor::BeginPlay() Actor.cpp:4251
AActor::DispatchBeginPlay(bool) Actor.cpp:4191
AWorldSettings::NotifyBeginPlay() WorldSettings.cpp:305
AGameStateBase::HandleBeginPlay() GameStateBase.cpp:227
AGameModeBase::StartPlay() GameModeBase.cpp:205
UWorld::BeginPlay() World.cpp:5329
UGameInstance::StartPlayInEditorGameInstance(ULocalPlayer *, const FGameInstancePIEParameters &) GameInstance.cpp:566
UEditorEngine::CreateInnerProcessPIEGameInstance(FRequestPlaySessionParams &, const FGameInstancePIEParameters &, int) PlayLevel.cpp:3141
UEditorEngine::OnLoginPIEComplete_Deferred(int, bool, FString, FPieLoginStruct) PlayLevel.cpp:1589
UEditorEngine::CreateNewPlayInEditorInstance(FRequestPlaySessionParams &, const bool, EPlayNetMode) PlayLevel.cpp:1853
UEditorEngine::StartPlayInEditorSession(FRequestPlaySessionParams &) PlayLevel.cpp:2868
UEditorEngine::StartQueuedPlaySessionRequestImpl() PlayLevel.cpp:1167
UEditorEngine::StartQueuedPlaySessionRequest() PlayLevel.cpp:1064
UEditorEngine::Tick(float, bool) EditorEngine.cpp:1901
UUnrealEdEngine::Tick(float, bool) UnrealEdEngine.cpp:547
FEngineLoop::Tick() LaunchEngineLoop.cpp:5915
[Inlined] EngineTick() Launch.cpp:61
GuardedMain(const wchar_t *) Launch.cpp:182
LaunchWindowsStartup(HINSTANCE__ *, HINSTANCE__ *, char *, int, const wchar_t *) LaunchWindows.cpp:247
WinMain(HINSTANCE__ *, HINSTANCE__ *, char *, int) LaunchWindows.cpp:298

This is the code generating it (last line is where the exception happens)
Note that I am passing in the loaded model wrapped in a Blueprint struct, but it seems alright…

TWeakInterfacePtr<INNERuntimeCPU> Runtime = UE::NNE::GetRuntime<INNERuntimeCPU>(FString("NNERuntimeORTCpu"));

	if (!Runtime.IsValid())
	{
		GEngine->AddOnScreenDebugMessage(-1, 5.f, FColor::Red,
			TEXT("Could not fetch runtime NNERuntimeORTCpu " ));
		success = false;
		return;
	}
	
	const TObjectPtr<UNNEModelData> ModelData = modelData.ModelData;
	if (ModelData.IsNull())
	{
		GEngine->AddOnScreenDebugMessage(-1, 5.f, FColor::Red,
			TEXT("Could not create ModelData " ));
		success = false;
		return;
	}

	if (!IsValid(ModelData))
	{
		GEngine->AddOnScreenDebugMessage(-1, 5.f, FColor::Red,
			TEXT("Invalid ModelData " ));
		success = false;
		return;
	}
	
	const TSharedPtr<IModelCPU> Model = Runtime-CreateModelCPU(ModelData);

Cyberqat · October 8, 2024, 11:10pm

I tried it with minst-8.onnx (the lin in the tutorial is broken, I had to search it). Its not exceptioning but I am getting these warnings and Model.IsValid() is returning false…

LogNNE: Warning: Input model is invalid : ConvAddFusion_Add_B_Parameter88 in initializer but not in graph input.
LogNNE: Warning: Model validator 'ONNX Model validator' detected an error.
LogNNE: Warning: Model validation failed after optimisation pass 'Onnx runtime model optimization'.
LogNNE: Warning: UNNERuntimeORTCpu cannot create a model from the model data with id E0C91DF54A5BE2EA89F2CDBED899883F

ranierin · October 9, 2024, 11:53am

Hey @user_9618397a5d25a4979c35d35e03cad1f6cec5b3c61f5358356d0910 ,

The changes between 5.3 and 5.4 are minor and have been posted above in this thread.
NNE focus on neural networks in general and leaves application specific parts like tokenizer for LLMs to the corresponding features. Explaining LLMs here would be beyond the scope, but as you describe, you turn tokens into input-ids and run the network iteratively
Yes, all the boiler plate code around the neural network (tokenization, embedding, strategy to select the best logit, beam search, maintaining key/value caches etc.) is up to you as it also heavily depends on which LLM you are using

ranierin · October 9, 2024, 12:06pm

Hey @Cyberqat this sounds like an issue with the specific model you are using. Can you try download a different one, maybe also wit a later opset? I cannot reproduce this with mnist from the onnx modelzoo on github

Cyberqat · October 9, 2024, 3:49pm

Thanks. A model that is known good will definitely help me debug!

gabaly92 · October 9, 2024, 7:11pm

I think I remember getting the same error with Mnist 8, I tried Mnist 12 after and it works fine
mnist-12.onnx (25.5 KB)

SODIRIDEYAN · October 10, 2024, 1:15pm

Hello, I am interested in using the NNE API.

I’m not sure if I understand correctly, the NNE implements model inference using the onnxruntime inference framework. I currently have a question: at the inference step which is ModelInstance->RunSync(Inputs, Outputs), is there any way to replace with another inference framework, such as MNN which is a lightweight framework? OR, besides supporting the ONNX model format, what other model file formats does the NNE API support?

ranierin · October 11, 2024, 6:02am

Hi @SODIRIDEYAN ,

NNE itself is a common API to access many different inference frameworks. These frameworks reside in plugins. So e.g. if you want to use Onnxruntime, you enable the plugin NNERuntimeORT and will find the inference backends NNERuntimeORTCpu and NNERuntimeORTDml for cpu and gpu inference. If you want to use IREE, you need to enable the plugin NNERuntimeIREE, and so on.

Each runtime can itself defines which file formats it supports (onnxruntime uses onnx, IREE uses mlir, BasicCpu some custom format). But typically you can export from frameworks like TF or Torch to many different formats or convert between formats later.

However, I am not aware of any MNN integration into NNE, someone would have to do this first. If you are interested in light weight inference, I recommend looking into IREE which compiles models directly to game code, or BasicCPU which is based on ISPC and optimized for small MLPs

Cyberqat · October 11, 2024, 3:54pm

I can now load from an ONNX file and a make a model instance!

HOWEVER I need to be able to modify weights in memory. It seems that, in order to do this, I have to create my ONNX file in memory as there is no direct way to get and set weights in NNE?

As a precursor, I am trying to just load a known good ONNX file from disk and make a model data out of it. Whe I try to make an instance from this model data it says it can’t. Any help would be appreciated.
ModelData generating code below. I included the working one that uses the asset directly (and works) for comparison as well as the one that loads it as binary data (and doesnt produce a usable Model Data.)

FNNDataModel  UNNEBlueprintInterfaceBPLibrary::FromONNXFile(FString filePath, bool& success)
{
	TObjectPtr<UNNEModelData> ModelData =
		LoadObject<UNNEModelData>(NULL, *filePath);
	if (ModelData.IsNull())
	{
		GEngine->AddOnScreenDebugMessage(-1, 5.f, FColor::Red,
			TEXT("Failed to load model data from file: " + filePath));
		success = false;
	}
	else
	{
		success = true;
	}
	return FNNDataModel(ModelData);
}

FNNDataModel UNNEBlueprintInterfaceBPLibrary::FromONNXBytes(TArray<uint8> byteArray, bool& success)
{
	TObjectPtr<UNNEModelData> ModelData = NewObject<UNNEModelData>();
	ModelData->Init("onnx", MakeArrayView(byteArray.GetData(), byteArray.Num()));
	if (ModelData.IsNull())
	{
		GEngine->AddOnScreenDebugMessage(-1, 5.f, FColor::Red,
			TEXT("Failed to load model data from byte array"));
		success = false;
	}
	else
	{
		success = true;
	}
	return FNNDataModel(ModelData);
}

Cyberqat · October 11, 2024, 4:18pm

Okay, I figured out that its not stored as an ONNX file when its a asset but as a serialized UE object.
Now my test works.

My NNE Blueprint Library is now public on Github
profK/NNEBlueprintInterface: A plugin that provides access to the UE5 Neural Network Engine from Blueprints (github.com)

Cyberqat · October 11, 2024, 4:25pm

Loading the raw ONNX file worked! So never mind and thanks!

SODIRIDEYAN · October 14, 2024, 3:40am

Thanks for your reply. I have another 2 question regarding the NNE.

Whether or not an inference framework can be utilized will depend on the availability of related plugins such as NNERuntimeORT or NNERuntimeIREE. So if the inference framework I want to use has not be integrated into UE so far, does that mean I can develop the the plugin which is similar to the NNERuntimeORT myself? Of course this will involve quite a lot efforts to do it.
Currently I am trying to use NNERuntimeORTCuda in UE5.3 to integrate an super-sampling model into the game. My understanding is like this: INNERuntimeGPU is used for editor mode and not suitable for real-time rendering. Because it definitely involves the data transferring between CPU and GPU even though the input data is generated in GPU initially. INNERuntimeRDG will be optimal if I want to do frame super-sampling in the real-time rendering.

HelloJXY · October 26, 2024, 7:09am

I created an interesting AI model application a few months ago and have developed it into a plugin. This plugin uses an ONNX model file and supports packaging in version 5.3, but in version 5.4, it only runs in the Editor. I was hesitant about making this work public for others to reference, but after an unsuccessful attempt to list it on the Marketplace, I am now determined to share it openly. I hope this will help more developers build their own creative applications and gain valuable development experience.

Link to this plugin: UnrealPlugin_SketchRecognition

HelloJXY · October 26, 2024, 7:20am

This code includes an example code for a multi-input model, you can see the code changes made from versions 5.3 to 5.4.
The model file is very lightweight (Only 10MB, but it contains 344 identifiable categories), making it ideal for testing and reference. If you encounter any issues, feel free to discuss them here so we can build experience together.

ranierin · October 29, 2024, 10:38am

@SODIRIDEYAN

Yes, exactly! You basically need to create your own plugin, have classes that implement the runtime, the model and model instance interfaces. On plugin start, you register your runtime with NNE, and everyone will be able to use it (when having your plugin enabled)
Yes, the GPU interface runs inference independent of the frame rendering but still compete for resources and thus not suiteable for in-game inference. If you are interested in in-game/in-frame inference, you should consider the INNERuntimeRDG interface. For super sampling in specific you can create your own view extension which will be called when you have to enqueue your neural network to the RDG builder. Please note, we removed the cuda backend in 5.4 as it required a specific cuda version being installed manually on the client device.

@Cyberqat and @HelloJXY : Pretty awesome, thanks for sharing your work with the public! This will surely inspire people to bring ML into their applications

umenokin · November 12, 2024, 4:04am

I am adopting example to Unreal Engine 5.4.4. Besides a few minor changes I am able to compile the example but when I try to load the model I am getting:

LogTemp: Display: ManuallyLoadedModelData loaded mnist-8
LogTemp: Display: PreLoadedModelData loaded mnist-8
LogNNE: Display: OnnxRuntimeModelOptimizerPass runned in 0.0 seconds.
LogNNE: Warning: Input model is invalid : ConvAddFusion_Add_B_Parameter88 in initializer but not in graph input.
LogNNE: Warning: Model validator 'ONNX Model validator' detected an error.
LogNNE: Warning: Model validation failed after optimisation pass 'Onnx runtime model optimization'.
LogNNE: Warning: UNNERuntimeORTCpu cannot create a model from the model data with id 8EA8DE782F4D9BA07DD436AD20ECC2AE
LogTemp: Error: Failed to create the model

The differences with original example are:

It seems the path to mnist-8 model is outdated. I used models/validated/vision/classification/mnist/model/mnist-8.onnx at main · onnx/models · GitHub
A few API changes:
Use TSharedPtr and CreateModelCPU instead of CreateModel

TSharedPtr<UE::NNE::IModelCPU> Model = Runtime->CreateModelCPU(ManuallyLoadedModelData);

CreateModelInstanceCPU instead of CreateModelInstance

The rest is exactly taken from NNE - Quick Start Guide - 5.3 | Tutorial

Any suggestions?

umenokin · November 12, 2024, 6:18am

It seems was able to load mnist-12 like mentioned in on of the previous post, but still would be nice to understand why. Seems we are loading this model with no special configurations thus this can affect other models I am planning to play with soon.

gabaly92 · November 14, 2024, 12:13am

@ranierin Unreal Engine 5.5 seems to bring some exciting updates to NNE according to the release notes, Unreal Engine 5.5 Release Notes | Unreal Engine 5.5 Documentation | Epic Developer Community | Epic Developer Community, can you confirm some questions and comment on this ?

[NNE] Extended NNERuntimeORTDml to implement NPU runtime interface.

does this mean NNE now supports inference on Snapdragon chips NPU and Apple M chips NPUs ?

NNERuntimeORT upgrade ONNX Runtime to version 1.17.1 and its required dependency DirectML to version 1.13.1.

definitely a welcome improvement for more advanced models

Our plugin NNERuntimeORT already includes a DirectML runtime that implements the GPU interface for neural network inference outside the rendering of a frame. We implement the necessary infrastructure to let this backend also implement the RDG interface for in-frame inference, providing a fast and powerful execution provider on DX12 based desktop systems.

is this what I think it is ? like does this mean we don’t have to worry about the complexity of running models using RDG and can use the same interface as the other runtimes?

ranierin · November 15, 2024, 8:28am

@umenokin I am not 100% sure but it could be an issue with the model. Glad you were abel to run it with a more recent opset version