Course: Neural Network Engine (NNE)

Hey @Nitecon

Model conversion is always very tricky: It’s supposed to work in theory but there are many obstacles in practice.

We don’t do any checks on importing: Internally the model is just stored as byte blob and gots only changed/optimized when cooking the game.

When you try to create a model and the model is not supported (e.g. because it has a too high opset version, or certain operators are not supported by the used runtime) CreateModel will return an empty pointer. In your code you dont test for it but directly call CreateModelInstance and I can imagine that this causes the crash. Another reason could be that the ORT library to which we delegate the model creation has an issue and it crashes there.

So to prevent the crash, I would split CreateModel and CreateModelInstance and do tests on the results.

However, this does not solve your problem, it maybe wont crash anymore but you will still not be able to load the model from your onnx.

What you could try is to export the model with a lower opset. Another option is to try open the resulting onnx with another tool like netron.app

Can you share the onnx file with us? You made me curious :wink:

Sure I will attach it, the highest opset needed for that model is only 9. And yes you caught me I shortened the use of the pointer there and didn’t even notice :stuck_out_tongue:

lightgbm.txt.onnx (7.6 MB)

Just as an update I also updated the initial types for my lightgbm model to this:
initial_types = [('feature_input', FloatTensorType([1, num_features]))]
So it now matches a similar CNN model with uses 1,773 when inspecting it

Edit: Adding the updated model as a link
lightgbm.txt.onnx (670.9 KB)

Just some more context I also updated the function like this:

bool UNeuralData::InitModel(const FString& ModelPath, const FString& RuntimeType)
{
	ModelData = LoadObject<UNNEModelData>(GetTransientPackage(), *ModelPath);
	Runtime   = UE::NNE::GetRuntime<INNERuntimeCPU>(TEXT("NNERuntimeORTCpu"));
	if (Runtime.IsValid())
	{
		try
		{
			auto LoadedModel = Runtime->CreateModel(ModelData);
			if (!LoadedModel.IsValid())
			{
				UE_LOG(LogTemp, Error, TEXT("LoadedModel is invalid"));
				return false;
			}
			ModelInstance = LoadedModel->CreateModelInstance();
			if (!ModelInstance.IsValid())
			{
				UE_LOG(LogTemp, Error, TEXT("ModelInstance is invalid"));
				return false;
			}
		}
		catch (std::exception& e)
		{
			UE_LOG(LogTemp, Error, TEXT("Exception: %hs"), UTF8_TO_TCHAR(e.what()));
			return false;
		}
	} else
	{
		UE_LOG(LogTemp, Error, TEXT("Invalid model runtime..."));
		return false;
	}
	return true;
}

It still crashes at same point with exception:

Exception = Exception 0xc0000005 encountered at address 0x21629de04a0: Access violation reading location 0x00000000
Registers
this = {UNeuralData *const} 0x000007e6ba2a2e40 (Name="NeuralData"_0)
ModelPath = {const FString &} L"/Script/NNE.NNEModelData'/Game/Models/DataMaster.DataMaster'"
RuntimeType = {const FString &} L"NNERuntimeORTCpu"
LoadedModel = {TUniquePtr<UE::NNE::IModelCPU,TDefaultDelete>} Ptr=0x000007e6bbfef0e0 {...}

Don’t worry about the shortcut, happens to me all the time :stuck_out_tongue:

Thanks for the updates, the crash seems to happen inside the ORT library. A quick look on the model indicates multiple possible issues: The operations that are used are part of the ai.onnx.ml domain and I could not find which ORT version supports them or if this domain is supported at all. The other issue could be that the model seems to be made for ORT 1.14 while in 5.3 we only support an older version.

If you want to dive deep you could try out loading the model with ORT outside of UE where you can play around with which version you are able to load and run it. But it is probably a lot of work…

There are some conversion issues with previous version of the onnx conversion tools but I’m happy to take a look when I get some time, hopefully this weekend, what version is UE5.3 on now so I test on the same versions. Although it may just be easier to provide inference on lightgbm / xgboost via a separate plugin tbh. Would still be nice to have a single onnx 4 allthethings though.

Also thanks for the comments / help so far!

1 Like

If the older tools had issues, it could be an indicator that it contains operations or features that have not been supported and would explain the issues. If that is the case, I fear you will have to wait for the next NNE update :frowning:

If you have access to UE on github you could already start experimenting using the main branch and see if you are lucky with our latest version.

Happy New Years, and @ranierin when you get to this… I ended up building in support for LightGBM using the standard libraries that they provide and it’s working well at this point however I’m looking to do some experimentation with RNN’s and LSTM specifically, however that is proving especially slow on CPU for inference, I do have the correct download for UE5.3 for cuda but I still get:

Can't find Cudnn 8.2.2.26 in PATH. Please ensure the exact version is installed

And I made sure it’s in path just threw it in a testing D:\bin directory that I properly added to envars:

$> where cudnn*
D:\bin\cuda\bin\cudnn64_8.dll
D:\bin\cuda\bin\cudnn_adv_infer64_8.dll
D:\bin\cuda\bin\cudnn_adv_train64_8.dll
D:\bin\cuda\bin\cudnn_cnn_infer64_8.dll
D:\bin\cuda\bin\cudnn_cnn_train64_8.dll
D:\bin\cuda\bin\cudnn_ops_infer64_8.dll
D:\bin\cuda\bin\cudnn_ops_train64_8.dll
D:\bin\cuda\lib\x64\cudnn.lib
D:\bin\cuda\lib\x64\cudnn64_8.lib
D:\bin\cuda\lib\x64\cudnn_adv_infer.lib
D:\bin\cuda\lib\x64\cudnn_adv_infer64_8.lib
D:\bin\cuda\lib\x64\cudnn_adv_train.lib
D:\bin\cuda\lib\x64\cudnn_adv_train64_8.lib
D:\bin\cuda\lib\x64\cudnn_cnn_infer.lib
D:\bin\cuda\lib\x64\cudnn_cnn_infer64_8.lib
D:\bin\cuda\lib\x64\cudnn_cnn_train.lib
D:\bin\cuda\lib\x64\cudnn_cnn_train64_8.lib
D:\bin\cuda\lib\x64\cudnn_ops_infer.lib
D:\bin\cuda\lib\x64\cudnn_ops_infer64_8.lib
D:\bin\cuda\lib\x64\cudnn_ops_train.lib
D:\bin\cuda\lib\x64\cudnn_ops_train64_8.lib

I suspect there is more needed inside the build.cs to include the libs and dll’s for runtime no? Any writeup for anything like that? I know you guys are making great strides on this still.

And a happy new year to you too @Nitecon :slight_smile:

I remember that we had to make sure that a very exact version of cudnn (not even cuda, but cudnn) had to be installed to make it work, very confusing unfortunately.

Let me find someone more capable of answering this as I am not familiar with the details. Sorry for the delays in replies vacation is/was interfering :wink:

Happy new year @Nitecon!

Cudnn need to be installed in the path under CUDNN\v8.2.2.26 to be found, witch is the recommended cudnn install path as seen in Installation Guide - NVIDIA Docs.
You can look at NNERuntimeORTGpuModel.cpp in FModelInstanceORTCuda::InitializedAndConfigureMembers() if you want to see the related code :).

That being said moving to 5.4 we will be removing support for the NNERuntimeORTCuda runtime in favor of the NNERuntimeORTDml one. Cuda and Cudnn are efficient however they are huge libraries and would be difficult to deliver to final customers (considering size and not trivial installation), we are thus moving toward the DirectML one witch is lightweight and ship fully within the plugin.

Thanks for the updates @ranierin and @LenoriL, with Cuda not being support I guess it’s a moot point to continue on that path unless I’m going to build it all myself like with the lightgbm although that was way simpler of an implementation than cuda. Quick question on this though I’m not familar with the Dml how different is it and does it support GPU inference, or is it all CPU based? A lot of my models are trained on GPU and right now cause hangups in my experimentation with the game thread even though I run inference in it’s own async class. For GPU related items I don’t see much issues.

From a very brief look at the headers it does look quite different on how I would interact between cpu / dml from an inference perspective.

You are right, the difference between the GPU and CPU is minimal. We chose different APIs to have a clear indicator whether there is some memory transfer involved or not.

So if you use the GPU interface (no matter what runtime runs behind it) you can expect your CPU data to be uploaded to GPU, inference runs there, then there will be a (potentially slow) CPU sync and results copied back.

So if your models run faster on GPU and the speedup is better than the loss in up and download, the GPU interface is the right choice.
Which runtime then only depends on whether your model is supported on it or not.

Hey folks for those starting up on machine learning / playing in unreal engine, I figured I’d share my “NeuralData” Game Instance Subsystem. it worked very well for me so far and will hopefully expand some on the course.

First and foremost you need to create a new Game Instance Subsystem. For me I called my NeuralData as seen in the gist. Then populate it with the code provided here: Neural Data Subsystem for unreal engine. · GitHub

Then you want to initialize your subsystem with the model you want to use with something like this:

if (!NeuralSubsystem)
	{
		NeuralSubsystem = GetGameInstance()->GetSubsystem<UNeuralData>();
	}
	if (NeuralSubsystem)
	{
		auto NeuralInit = NeuralSubsystem->InitModel("/Script/NNE.NNEModelData'/Game/Models/FancyModel.FancyModel'", "NNERuntimeORTCpu");
		if (NeuralInit)
		{
			UE_LOG(LogTemp, Warning, TEXT("Neural model loaded"));
			const TArray<uint32> InputLayers = {1, 1234};
			bInferenceReady                  = NeuralSubsystem->SetShapes(InputLayers, 3);
			// check if not bound then add dynamic delegate
			if (!NeuralSubsystem->OnResult.IsBound())
				NeuralSubsystem->OnResult.AddDynamic(this, &USomeClass::HandleInferenceResult);
		} else
		{
			UE_LOG(LogTemp, Error, TEXT("Cannot load Neural Model."));
		}
	}

In the above it gets the subsystem, initalizes a model under my Game/Models/FancyModel example that was previously created based on what is in the course, and if it is initialized it then also sets the Input Shapes in my example 1 x 1234 features.

At this point if it initialized correctly you can now run a classify with something like:

NeuralSubsystem->RunClassify(infData, true, LastReadSocketTimestamp);

Where InfData is a TArray of floats, the boolean is used if the TArray needs to be wrapped for LSTMs, and I made use of the timestamp in order to track the time it took to do inference on a model until the broadcast is received.

Then the final part would be effectively listening to the inference broadcast that would be sent by the model.

Note that for my NeuralData being primarily focused on classification I added both a softmax (multi class) and sigmoid (binary) classification functions which is needed to convert the logits to things usable by the rest of my unreal things!

And finally as shown above my model would broadcast the inference data result via standard delegate (I liked the delegate as it allowed me to bind to the inference result in both c++ and blueprints to keep things nice and clean). This made it very easy to just import a new model initialize it and set the shapes based on what I was testing, then run inference all day long with it working across the entire system. As a final note my InferenceData struct looks like this:

USTRUCT(BlueprintType)
struct FInferenceData
{
	GENERATED_BODY()

	UPROPERTY(BlueprintReadWrite, Category = "NeuralGPU")
	int Cat = 0;
	UPROPERTY(BlueprintReadWrite, Category = "NeuralGPU")
	float Confidence = 0.f;
	UPROPERTY(BlueprintReadWrite, Category = "NeuralGPU")
	bool bIsValid = false;
	UPROPERTY(BlueprintReadWrite, Category = "NeuralGPU")
	FLocalDateTime Time;
	uint64_t Timestamp = 0;
};```
FLocalDateTime in the example above is essentially just an adaption of FDateTime with Local Timezone and a few other things.

Hope this helps and saves some time for you folks that are just starting up!
2 Likes

Hi @ranierin ,
I get these warnings when packaging (development /debug build) starts and not working in the packaged version. Is this related to all the experimental plugins, and is there any solution for this?

Hi @Heyzonsteve , the warning about operators not registered indicates that the model is not supported on this runtime due to the lack of operator support. You have either to try another runtime, change the model to not use the operation in need (transpose in your case, which could be difficult) or wait until we have added the operator.

Sorry for the inconvenience!

2 Likes

Thank you @ranierin, for the quick response. Since I’m extensively using this plugin for my research, I don’t have much time so definitely I’ll try your suggestions. :slightly_smiling_face:

1 Like

Hi @ranierin I am curious to know, what changes should we expect for NNE in UE 5.4, if you can give us an overview that would be great, thank you

Hi @gabaly92 ,

The easiest way to get a complete overview is to look into the main branch of UE on github where you will see everything we did (just search for files containing NNE).

But you can expect some minor changes in the core API, progress on all existing runtimes as well as new runtimes to better cover inference in standalone.

There is still a long road ahead of us but I think with 5.4 we will have a big milestone, especially for in-editor inference.

2 Likes

great, I have another question, is it possible to switch the onnx runtime version used for inference ? or am I bound to the one that comes with the plugin ? for example the current onnx runtime version supports models up to opset 18, I had an opset 19 model that I wasn’t able to load

Unfortunately we only support the version that is inside the plugin, replacing the binaries in the plugin on your own will probably not work due to API changes.

You could try to re-export your model with a lower opset: only few operators are typically affected by a version upgrade, so it will maybe not even make a difference in your case.

2 Likes