Course: Neural Network Engine (NNE)

ranierin · March 28, 2025, 7:28am

@jvin1011 Yes, there are different runtimes with different platform support and also different format support. Unfortunately we do not have any runtime yet that supports the tflite file format. However, there are a number of model converters out there which are able to convert tflite to e.g. onnx (see here for an example) and then you could use existing runtimes.

Of course it is always possible to use any other inference engine directly (also tflite) without going through NNE. The downside is that you have to integrate the libraries yourself, and if it does not run on all target platforms that you are interested then you need to do the same for other runtimes, creating a huge fragmentation in your code. Or. e.g. if you want to use NPUs you will need to add a runtime for each hardware vendor you plan to run on as well.

That is why we started with NNE: Provide you with a single API to be able to access all platforms the same way and be extendible to include future runtimes as well.

So long story short: I would try to export or convert your model to onnx and save yourself the pain to add your own runtimes

jvin1011 · May 21, 2025, 7:37am

Hi Nico, I have exported my ONNX model in FP16 format. When I attempt model inference using the RDGHlsl backend, I encounter the following error.

[2025.05.20-06.53.23:601][ 0]LogNNERuntimeRDGHlsl:
Warning: Input at index '0' (from template T0) is of type
"Half' witch is not suDported for that input.
[2025.05.20-06.53.23:601][ 0]LogNNERuntimeRDGHlsl:
Warning: OperatorRegistryfailed to validate
operator:Slice
[2025.05.20-06.53.23:602][ 0]LogNNERuntimeRDGHlsl:
Warning: Model validator 'RDGModel validator'
detected an error.
[2025.05.20-06.53.23:602][ 0]LogNNERuntimeRDGHlsl:
Warning: Model is not valid.
[2025.05.20-06.53.23:602][ 0]LogNNERuntimeRDGHlsl:
Warning: Cannot create a model fromthe model data
with id B67298DC456900C2B7797DA66DF4EA2F
[2025.05.20-06.53.24:736][0]LogTemp: Error: Could not
create the RDG model

Is FP16 inference not supported in NNE, or is it specifically unsupported with the RDGHlsl backend? The same model when exported in FP32 works fine using RDGHlsl backend but the performance is a little slow, so I’m exploring ways to optimize and improve it.

ranierin · May 23, 2025, 6:31am

Hi @jvin1011,

Yes, fp16 support with the HLSL runtime is still limited. Which engine version are you using?

If you work on a DirectX based system you can use the runtime NNERuntimeORTDml where you have a high chance to get the model running. Also, depending on the model, DirectML can access tensor cores giving you an additional boost.

This may not be suited for your final product if you aim for multiple target platforms, but at least will help you assess the performance of your model.

Best
Nico

jvin1011 · May 23, 2025, 6:56am

I am using version 5.5.1 of Unreal Engine. Since I am running on Windows, I initially tried using DirectX support, but it didn’t work for me, which is why I’m working with the HLSL runtime. Overall, the HLSL runtime works fine, but I’m looking for ways to optimize it. Please suggest.

gabaly92 · June 3, 2025, 11:00pm

@ranierin in the release notes of Unreal Engine 5.6 it says this on NNE

NNERuntimeORT upgrade to ONNX Runtime 1.20 and upgraded DirectML to version 1.15.2.

which is nice, any other major updates or changes to NNE that we should be aware of ?

Thank you

ranierin · June 6, 2025, 6:08am

@jvin1011 apologies for the late reply. I think your best chance is to try to reduce the model size, sorry Alternatively try to get DirectML running, it should work if you are on a DirectX based system.

ranierin · June 6, 2025, 6:10am

@gabaly92 We spent a lot of time in this release on NNERuntimeIREE. However, it is still work in progress and needs some expertise on how to adapt the model to get it running. But it shows great perfromance on CPU for small real time models due to it’s low overhead compared to other runtimes.

gabaly92 · June 7, 2025, 1:24am

Awesome thank you for the update

YuriNK · June 14, 2025, 9:01am

Is here any way to read onnx file metadata? I guess not, and it’s the feature I’d like to see in future updates.

HelloJXY · June 15, 2025, 7:19am

You can try netron, and here is its github. If you want to modify onnx file, try onnx-modifier. Hope it can be of help to you.

ranierin · June 27, 2025, 7:36am

@YuriNK There are no plans to expose this within NNE but @HelloJXY gave a nice answer how you can get around

Localerhorst · July 3, 2025, 12:15am

Hi,
im trying to use a Model with multiple outputs in Unreal 5.5 NNERuntimeCPU but i cant get it to work.

How do I set up the tensor shapes and output data properly for multiple outputs and how do I get the data in the end? Do i then use CPUOutputBindings[0].Data CPUOutputBindings[1].Data and CPUOutputBindings[2].Data?

The model is SuperPoint LightGlue with inputs ( 2 B , 1 , H , W ) and outputs ( 2 B , 1024 , 2 ) , ( M , 3 ) , ( M , ) . I used it succesfully in the regualr ONNXRuntime but i cant set it up in NNE . Can someone help me?

HelloJXY · July 3, 2025, 9:14am

You can try to view the plugin example I released earlier, which includes the cpu deployment of the multi-output model. Although he wrote the code for 5.3 and 5.4, I remember that my 5.4 version can run very well in 5.5. Even if changes are needed, they are very easy to implement because I have no memory of the modifications adapted to 5.5. You can simply change the plugin version number to try running it. The source code section clearly demonstrates how to deploy a multi-input and multi-output model.

Localerhorst · July 3, 2025, 4:52pm

Thanks, got it working

Localerhorst · July 5, 2025, 12:24pm

Actually i got one more problem. My output size is dynamic too, it depends on the number of matches. When i try to run inference a second time the output size changes and i get this error:
LogNNERuntimeORT: Error: Non-zero status code returned while running Reshape node. Name:'/matcher/Reshape_3' Status Message: D:\a\_work\1\s\onnxruntime\core\framework\execution_frame.cc:173 onnxruntime::IExecutionFrame::GetOrCreateNodeOutputMLValue shape && tensor.Shape() == *shape was false. OrtValue shape verification failed. Current shape:{743} Requested shape:{1009}

Is dynamic output size not supported or is it a skill issue?

HelloJXY · July 5, 2025, 3:10pm

As far as I know, when exporting onnx model files, the dynamic axes of input and output can be set, such as BatchSize. I’m not sure if you have made this setting, as it seems to be an error from onnxruntime rather than NNE itself.

However, even if you have an onnx file that supports dynamic variable output, NNE is very likely not to support it.

Therefore, I suggest using some “mask” to mark the valid data area in the input/output and setting a fixed maximum size. This enables NNE to be bound to a fixed-size input/output buffer, which helps to avoid encountering strange problems. This is the simplest and most effective solution to your problem.

ranierin · July 15, 2025, 8:23am

Hi @Localerhorst ,

Dynamic shape support is neither limited by NNE nor the ONNX file format, but the runtime you are using. (Both NNE and ONNX just define dynamic shapes by symbols, it is up to the runtime to interprete/fill them).

However: there are two kinds of ‘dynamic shapes’:

One is when the output shape depends on the shape of your input (e.g. you can change your batch size in the input and it will change the batch size of the output) which is supported by the Onnxruntime that you are using.

The other one is, if the output shape depends on the content of your input, which seems to be the case in your model. This is typically more difficult to handle by the runtimes and thus @HelloJXY suggestion is the most robust you can do.

ranierin · August 29, 2025, 7:52am

Hey All,

Just wanted to point out some links. We have a couple of hardware vendors that brought their NPU/GPU into the NNE ecosystem:

Qualcomm offer access to their SNPE through their NNE plugin already since UE 5.4
ARM has a RDG runtime giving access to their Vulkan extension
Intel has released their OpenVino NNE plugin on FAB

We are happy to see so much interest from the hardware vendors to support you in running your ML models inside the Engine

HelloJXY · August 29, 2025, 4:25pm

Last year, I tried using the on-chip Mali-G610 NPU to run and render UE third-person demo on a RK3588 running Ubuntu, but UE5 only supports Vulkan rendering while G610 does not. I did not find any officially supported Vulkan driver for Mali-G610, leading to numerous errors or exceptions. I almost gave up on porting UE programs to such resource-constrained embedded devices; it was too difficult, causing me to crash😭.

pauciloquent98 · September 7, 2025, 2:11pm

Hello! I’m not new to Unreal but fairly new to Unreal C++ and loading NNEs this way. I’ve been trying to get a model I made added in by following the tutorial (it’s being loaded into my C++ blueprint as a UAsset) without much success. I know it’s an issue with how I’m loading in the tensor input/output but nothing I try seems to fix it.

I left all of my comments and old commented code in case that’s helpful at all. If any of my notes are wrong, also happy to know about that.

Any help would be very much appreciated!!

ANNEE.cpp

// Fill out your copyright notice in the Description page of Project Settings.


#include "ANNEE.h"

#include "Engine/AssetManager.h"

TArray<uint32> InputData = { 64, 3, 200, 200 };
TArray<uint32> OutputData = { 64, 3, 100, 100 };
TArray<UE::NNE::FTensorBindingCPU> InputBindings;
TArray<UE::NNE::FTensorBindingCPU> OutputBindings;

// Sets default values
AANNEE::AANNEE()
{
	// Set this actor to call Tick() every frame.  You can turn this off to improve performance if you don't need it.
	PrimaryActorTick.bCanEverTick = true;


}

//TODO: Set tensor size to match original model
// Called when the game starts or when spawned
void AANNEE::BeginPlay()
{
	UE_LOG(LogTemp, Display, TEXT("!!!!!AANNEE is beginning play!!!!!"));
	Super::BeginPlay();

	//Running on CPU works for in editor and runtime. Can be called synchronously on game thread or asynchronously (starting with this approach)
	//INNERuntimeRDG <-- Run frame aligned. If you can't call model when needed with INNERuntimeCPU, might have to switch to this. Needs FRDGBuilder and  Render Dependency Graph knowledge (https://docs.unrealengine.com/5.2/en-US/render-dependency-graph-in-unreal-engine/)
	//!!!!!IMPORTANT!!!!!
	//!!!!!NNERuntimeORTCpu = In editor model running. Will NOT run IN GAME after building. Have to change this to *something*. Can't find other command rn
	TWeakInterfacePtr<INNERuntimeCPU> Runtime = UE::NNE::GetRuntime<INNERuntimeCPU>(FString("NNERuntimeORTCpu"));
	if (Runtime.IsValid())
	{
		TUniquePtr<UE::NNE::IModelCPU> Model = Runtime->CreateModel(PreLoadedModelData);
		UE_LOG(LogTemp, Display, TEXT("PreLoadedModelData loaded %s"), *PreLoadedModelData->GetName());
		if (Model.IsValid())
		{
			//Needed if preloading from asset?
			//ModelHelper->ModelInstance = Model->CreateModelInstance();
			TUniquePtr<UE::NNE::IModelInstanceCPU> ModelInstance = Model->CreateModelInstance();
			if (ModelInstance.IsValid())
			{
				//Use if tensor size is known
				
				//bool bIsRunning;

				TArray<UE::NNE::FTensorShape> InputTensorShapes = { UE::NNE::FTensorShape::Make(InputData) };
				InputData.SetNumZeroed(InputTensorShapes[0].Volume());
				InputBindings.SetNumZeroed(1);
				InputBindings[0].Data = InputData.GetData();
				InputBindings[0].SizeInBytes = InputData.Num() * sizeof(float);

				TArray<UE::NNE::FTensorShape> OutputTensorShapes = { UE::NNE::FTensorShape::Make(OutputData) };
				OutputData.SetNumZeroed(OutputTensorShapes[0].Volume());
				OutputBindings.SetNumZeroed(1);
				OutputBindings[0].Data = OutputData.GetData();
				OutputBindings[0].SizeInBytes = OutputData.Num() * sizeof(float);

				//Figure out how much memory to allocate to input and output
				//Input can be populated before model runs. Output can be populated after inference completes 
				//IMPORTANT: Have to match input/output of tensor shapes to input/output dimensions of original model
				//Input shape = (64, 3, 200, 200)
				TConstArrayView<UE::NNE::FTensorDesc> InputTensorDescs = ModelInstance->GetInputTensorDescs();
				//checkf(InputTensorDescs.Num() == 1, TEXT("The current example supports only models with a single input tensor"));
				//UE::NNE::FSymbolicTensorShape SymbolicInputTensorShape = InputTensorDescs[0].GetShape();
				//////IsConcrete = Tests if any dimensions set to -1 (which means model accepts any size of input/output tensor)
				////checkf(SymbolicInputTensorShape.IsConcrete(), TEXT("The current example supports only models without variable input tensor dimensions"));
				//checkf(SymbolicInputTensorShape.Rank() == 4, TEXT("Neural Post Processing requires models with input shape 64 x 3 x 200 x 200!"));
				//checkf(SymbolicInputTensorShape.GetData()[0] == 64, TEXT("Neural Post Processing requires models with input shape 64 x 3 x 200 x 200!"));
				//checkf(SymbolicInputTensorShape.GetData()[1] == 3, TEXT("Neural Post Processing requires models with input shape 64 x 3 x 200 x 200!"));
				//checkf(SymbolicInputTensorShape.GetData()[2] == 200, TEXT("Neural Post Processing requires models with input shape 64 x 3 x 200 x 200!"));
				//checkf(SymbolicInputTensorShape.GetData()[3] == 200, TEXT("Neural Post Processing requires models with input shape 64 x 3 x 200 x 200!"));
				////TArray<UE::NNE::FTensorShape> InputTensorShapes = { UE::NNE::FTensorShape::MakeFromSymbolic(SymbolicInputTensorShape) };

				//Set the input tensor dimension. Must be called each time size would change
				//Input shape = (64, 3, 200, 200)
				ModelInstance->SetInputTensorShapes(InputTensorShapes);

				//Output shape of (64,32,100,100)
				TConstArrayView<UE::NNE::FTensorDesc> OutputTensorDescs = ModelInstance->GetOutputTensorDescs();
				FString OutputTensorDescs_String = FString(UTF8_TO_TCHAR(reinterpret_cast<const char*>(OutputTensorDescs.GetData())));

				UE_LOG(LogTemp, Display, TEXT("OutputTensorDescs: %s"), *OutputTensorDescs_String);
				//checkf(OutputTensorDescs.Num() == 1, TEXT("The current example supports only models with a single output tensor"));
				//UE::NNE::FSymbolicTensorShape SymbolicOutputTensorShape = OutputTensorDescs[0].GetShape();
				////checkf(SymbolicOutputTensorShape.IsConcrete(), TEXT("The current example supports only models without variable output tensor dimensions"));
				//checkf(SymbolicOutputTensorShape.Rank() == 4, TEXT("Neural Post Processing requires models with input shape 64 x 3 x 100 x 100!"));
				////IMPORTANT: Unreal is crashing after this line
				////Try feeding it an image/input to see if that's the only issue. If it can process the input, proceed. If it can't, debug or try starting over
				////a little bit with basing new stuff after sample project that should be first tab of UE5 NNE tab group
				//checkf(SymbolicOutputTensorShape.GetData()[0] == 64, TEXT("Neural Post Processing requires models with input shape 64 x 3 x 100 x 100!"));
				//checkf(SymbolicOutputTensorShape.GetData()[1] == 3, TEXT("Neural Post Processing requires models with input shape 64 x 3 x 100 x 100!"));
				//checkf(SymbolicOutputTensorShape.GetData()[2] == 100, TEXT("Neural Post Processing requires models with input shape 64 x 3 x 100 x 100!"));
				//checkf(SymbolicOutputTensorShape.GetData()[3] == 100, TEXT("Neural Post Processing requires models with input shape 64 x 3 x 100 x 100!"));
				//TArray<UE::NNE::FTensorShape> OutputTensorShapes = { UE::NNE::FTensorShape::MakeFromSymbolic(SymbolicOutputTensorShape) };

			}
			else
			{
				UE_LOG(LogTemp, Error, TEXT("Failed to create the model instance"));
			}
		}
		else
		{
			UE_LOG(LogTemp, Error, TEXT("Failed to create the model"));
		}
	}
	else
	{
		UE_LOG(LogTemp, Error, TEXT("Cannot find runtime NNERuntimeORTCpu, please enable the corresponding plugin"));
	}


}

// Called every frame
void AANNEE::Tick(float DeltaTime)
{
	//Added by default? 
	Super::Tick(DeltaTime);
	UE_LOG(LogTemp, Display, TEXT("!!!!!AANNEE is ticking!!!!!"));

	//This if statement makes sure RunSync is not called twice on same model instance or at the same time
	if (!ModelHelper->bIsRunning)
	{
		//Process ModelHelper->OutputData from previous run here
		//Pass new data into ModelHelper->InputData here

		ModelHelper->bIsRunning = true;
		TSharedPtr<FMyModelHelper> ModelHelperPtr = ModelHelper;
		//[]() = lambda notation for C++. Have to put ptr in [] as a capture to be able to access all vars/funcs from the ptr 
		AsyncTask(ENamedThreads::AnyNormalThreadNormalTask, [ModelHelperPtr]()
			{
				//Runs model on separate thread
				if (ModelHelperPtr->ModelInstance->RunSync(ModelHelperPtr->InputBindings, ModelHelperPtr->OutputBindings) != 0)
				{
					UE_LOG(LogTemp, Error, TEXT("Failed to run the model"));
				}
				//Once inference completes, queue another AsyncTask to run on game thread
				AsyncTask(ENamedThreads::GameThread, [ModelHelperPtr]()
					{
						ModelHelperPtr->bIsRunning = false;
					});
			});
	}

}

ANNEE.h

// Fill out your copyright notice in the Description page of Project Settings.

#pragma once

#include "CoreMinimal.h"
#include "GameFramework/Actor.h"

#include "NNE.h"
#include "NNERuntimeCPU.h"
#include "NNEModelData.h"

#include "ANNEE.generated.h"

//Model Helper especially needed for larger models
//Game could stop at any point and start freeing memory the inference is still using to try and run which would typically result in a crash
//Also don't want to copy a bunch of data around for performance reasons
class FMyModelHelper
{
public:
	TUniquePtr<UE::NNE::IModelInstanceCPU> ModelInstance;
	TArray<float> InputData = { 64.0f, 3.0f, 200.0f, 200.0f };
	TArray<float> OutputData = { 64.0f, 3.0f, 100.0f, 100.0f };
	TArray<UE::NNE::FTensorBindingCPU> InputBindings;
	TArray<UE::NNE::FTensorBindingCPU> OutputBindings;
	bool bIsRunning;
};

UCLASS()
class SHADOWSHAPES_CNN_API AANNEE : public AActor
{
	GENERATED_BODY()

	public:
		// Sets default values for this actor's properties
		AANNEE();

		// Called every frame
		virtual void Tick(float DeltaTime) override;

		//Automated loading, loads on actor spawn and unloads on despawn but model remains in memory for actor lifetime
		//Synchronous call (will only make this one call until it finishes loading which could block game starting)
		//Downside of preloaded = model will last for lifetime of actor. If model gets too big, might not be ideal
		UPROPERTY(EditAnywhere)
		TObjectPtr<UNNEModelData> PreLoadedModelData;

		//Delayed loading, must be triggered by func (like BeginPlay() ) in actor cpp file
		//Asynchronous call (can be a background call and can check when loading is finished)
		//UPROPERTY(EditAnywhere)
		//TSoftObjectPtr<UNNEModelData> LazyLoadedModelData;

	protected:
		// Called when the game starts or when spawned
		virtual void BeginPlay() override;

	private:
		//Pointer to model helper used to pass model data around
		TSharedPtr<FMyModelHelper> ModelHelper = MakeShared<FMyModelHelper>();

		//Use if tensor size is known
		/*TArray<float> InputData;
		TArray<float> OutputData;
		TArray<UE::NNE::FTensorShape> InputTensorShapes;
		TArray<UE::NNE::FTensorShape> OutputTensorShapes;
		TArray<UE::NNE::FTensorBindingCPU> InputBindings;
		TArray<UE::NNE::FTensorBindingCPU> OutputBindings;*/

};