Course: Neural Network Engine (NNE)

gabaly92 · May 25, 2024, 2:21am

@ranierin I got the Face Detection model you referenced to work along side the other 2 models I used earlier

Which means I have 3 models loaded in memory and ran inference with all of them in runtime, this is amazing, the possibilities with AI running directly in Engine are endless, all models are running with Unreal Engine 5.4.1 with NNE Beta ORT CPU

gabaly92 · June 6, 2024, 5:36am

@ranierin can you tell us more about the experimental IREE plugin ? I haven’t tested it out yet, but it looks like the way forward for compiling Neural Networks in Unreal Engine and optimizing them for all the different hardware targets that games can be deployed on, is it based on this IREE

ranierin · June 7, 2024, 6:09am

@gabaly92 Yes, the experimental IREE plugin compiles models into game code directly and is based on the library you linked. The version we released in 5.4 is based on IREE from last summer and is still missing some features. To use it, you need to export your model as .mlir containing a dialect like linalg on tensor, stablehlo or the alike. Not all operators are supported and it will need some experimenting to be able to export your model in a format that works.

If you are working on the main branch of Epic’s github repository and compiling the engine yourself, you will get a more recent version of IREE from this spring with a better operator coverage.

Besides having to import .mlir files rather than .onnx, the remaining workflow is the same as with other CPU runtimes.

Have fun trying it out!

HelloJXY · June 7, 2024, 6:48pm

I tried to use NNE in 5.3, but I had problems from the beginning: Following the 5.2 tutorial, I added NNECore to Build.cs, But the compiler prompts “Could not find definition for module ‘NNECore’, (referenced via Target → test.Build.cs)”.

I changed ‘NNECore’ to ‘NNE’ in Build.cs by looking for the relevant information, but the following three header files could not be found:
include “NNECore.h”
include “NNECoreRuntimeCPU.h”
include “NNECoreModelData.h”
Now I can’t use any of the NNE functions because the program can’t find the relevant library, and the errors are everywhere .

The software versions I am currently using are Visual Studio Community 2022 v17.5.1 and UE5.3.

I’d appreciate it if you had anything to tell me.

gabaly92 · June 8, 2024, 12:22am

Awesome, Thank you for pointing out the specific differences between IREE and the other plugins, will let you know if I have any questions when I test it out

gabaly92 · June 9, 2024, 11:13pm

@HelloJXY Here is what you need to do to get up and running with ORT(ONNXRuntime) CPU and GPU

Enable the plugins
Include the following headers

#include "NNE.h"
#include "NNERuntimeCPU.h"
#include "NNERuntimeGPU.h"

In your build.cs file this is what you need to include

PublicDependencyModuleNames.AddRange(new string[]
		{
			"Core", 
			"CoreUObject", 
			"Engine", 
			"InputCore", 
			"EnhancedInput",
			"NNE"
		});

And the example code should work fine

HelloJXY · June 11, 2024, 1:51pm

You’re a genius! It’s working very well!

By the way, references also needed to include file “NNEModelData.h”.

#include "NNE.h"
#include "NNEModelData.h"
#include "NNERuntimeCPU.h"

TriggerhappyJTV · June 17, 2024, 2:32pm

Heya, I’ve been attempting to get NNE running on 5.4 and have been struggling to translate some things from the older 5.2 quick start guide (chose this one as it’s for getting a blueprint going). I did go through and replace all the “NNECore” references with just “NNE” and for the most part it’s fine, I just have a few errors that stop me from building and have had no luck fixing them so far.

I’ve got all of the NNE related plugins enabled just in case and have added NNE to the project’s build file.

My files, they are mostly identical to the guide.
NeuralNetworkObject.h

#pragma once

#include "CoreMinimal.h"
#include "UObject/NoExportTypes.h"

#include "NNE.h"
#include "NNEModelData.h"
#include "NNERuntimeCPU.h"
#include "NNERuntimeGPU.h"
#include "NNERuntime.h"
#include "NNETypes.h"
#include "NNETensor.h"
#include "NeuralNetworkObject.generated.h"

USTRUCT(BlueprintType, Category = "NeuralNetworkObject")
struct FNeuralNetworkTensor
{
	GENERATED_BODY()

public:

	UPROPERTY(BlueprintReadWrite, Category = "NeuralNetworkObject")
	TArray<int32> Shape = TArray<int32>();

	UPROPERTY(BlueprintReadWrite, Category = "NeuralNetworkObject")
	TArray<float> Data = TArray<float>();
};

UCLASS(BlueprintType, Category = "NeuralNetworkObject")
class PROJECT_API UNeuralNetworkObject : public UObject
{
	GENERATED_BODY()

public:
	UFUNCTION(BlueprintCallable, Category = "NeuralNetworkObject")
	static TArray<FString> GetRuntimeNames();

	UFUNCTION(BlueprintCallable, Category = "NeuralNetworkObject")
	static UNeuralNetworkObject* CreateModel(UObject* Parent, FString RuntimeName, UNNEModelData* ModelData);

	UFUNCTION(BlueprintCallable, Category = "NeuralNetworkObject")
	static bool CreateTensor(TArray<int32> Shape, UPARAM(ref) FNeuralNetworkTensor& Tensor);

public:

	UFUNCTION(BlueprintCallable, Category = "NeuralNetworkObject")
	int32 NumInputs();

	UFUNCTION(BlueprintCallable, Category = "NeuralNetworkObject")
	int32 NumOutputs();

	UFUNCTION(BlueprintCallable, Category = "NeuralNetworkObject")
	TArray<int32> GetInputShape(int32 Index);

	UFUNCTION(BlueprintCallable, Category = "NeuralNetworkObject")
	TArray<int32> GetOutputShape(int32 Index);

public:

	UFUNCTION(BlueprintCallable, Category = "NeuralNetworkObject")
	bool SetInputs(const TArray<FNeuralNetworkTensor>& Inputs);

	UFUNCTION(BlueprintCallable, Category = "NeuralNetworkObject")
	bool RunSync(UPARAM(ref) TArray<FNeuralNetworkTensor>& Outputs);

private:

	TSharedPtr<UE::NNE::IModelInstanceCPU> Model;

	TArray<UE::NNE::FTensorBindingCPU> InputBindings;
	TArray<UE::NNE::FTensorShape> InputShapes;
};

NeuralNetworkObject.cpp


#include "NeuralNetworkObject.h"


TArray<FString> UNeuralNetworkObject::GetRuntimeNames()
{
	using namespace UE::NNE;

	TArray<FString> Result;
	TArrayView<TWeakInterfacePtr<INNERuntime>> Runtimes = GetAllRuntimes();
	for (int32 i = 0; i < Runtimes.Num(); i++)
	{
		if (Runtimes[i].IsValid() && Cast<INNERuntimeCPU>(Runtimes[i].Get()))
		{
			Result.Add(Runtimes[i]->GetRuntimeName());
		}
	}
	return Result;
}

UNeuralNetworkObject* UNeuralNetworkObject::CreateModel(UObject* Parent, FString RuntimeName, UNNEModelData* ModelData)
{
	using namespace UE::NNE;

	if (!ModelData)
	{
		UE_LOG(LogTemp, Error, TEXT("Invalid model data"));
		return nullptr;
	}

	TWeakInterfacePtr<INNERuntimeCPU> Runtime = GetRuntime<INNERuntimeCPU>(RuntimeName);
	if (!Runtime.IsValid())
	{
		UE_LOG(LogTemp, Error, TEXT("No CPU runtime '%s' found"), *RuntimeName);
		return nullptr;
	}

	TUniquePtr<IModelCPU> UniqueModel = Runtime->CreateModelCPU(ModelData);
	if (!UniqueModel.IsValid())
	{
		UE_LOG(LogTemp, Error, TEXT("Could not create the CPU model"));
		return nullptr;
	}

	UNeuralNetworkObject* Result = NewObject<UNeuralNetworkObject>(Parent);	
	if (Result)
	{
		Result->Model = TSharedPtr<IModelCPU>(UniqueModel.Release());
		return Result;
	}

	return nullptr;
}

bool UNeuralNetworkObject::CreateTensor(TArray<int32> Shape, UPARAM(ref) FNeuralNetworkTensor& Tensor)
{
	if (Shape.Num() == 0)
	{
		return false;
	}

	int32 Volume = 1;
	for (int32 i = 0; i < Shape.Num(); i++)
	{
		if (Shape[i] < 1)
		{
			return false;
		}
		Volume *= Shape[i];
	}

	Tensor.Shape = Shape;
	Tensor.Data.SetNum(Volume);
	return true;
}

int32 UNeuralNetworkObject::NumInputs()
{
	check(Model.IsValid())
	return Model->GetInputTensorDescs().Num();
}

int32 UNeuralNetworkObject::NumOutputs()
{
	check(Model.IsValid())
	return Model->GetOutputTensorDescs().Num();
}

TArray<int32> UNeuralNetworkObject::GetInputShape(int32 Index)
{
	check(Model.IsValid())

	using namespace UE::NNE;

	TConstArrayView<FTensorDesc> Desc = Model->GetInputTensorDescs();
	if (Index < 0 || Index >= Desc.Num())
	{
		return TArray<int32>();
	}

	return TArray<int32>(Desc[Index].GetShape().GetData());
}

TArray<int32> UNeuralNetworkObject::GetOutputShape(int32 Index)
{
	check(Model.IsValid())

	using namespace UE::NNE;

	TConstArrayView<FTensorDesc> Desc = Model->GetOutputTensorDescs();
	if (Index < 0 || Index >= Desc.Num())
	{
		return TArray<int32>();
	}

	return TArray<int32>(Desc[Index].GetShape().GetData());
}


bool UNeuralNetworkObject::SetInputs(const TArray<FNeuralNetworkTensor>& Inputs)
{
	check(Model.IsValid())

	using namespace UE::NNE;

	InputBindings.Reset();
	InputShapes.Reset();

	TConstArrayView<FTensorDesc> InputDescs = Model->GetInputTensorDescs();
	if (InputDescs.Num() != Inputs.Num())
	{
		UE_LOG(LogTemp, Error, TEXT("Invalid number of input tensors provided"));
		return false;
	}

	InputBindings.SetNum(Inputs.Num());
	InputShapes.SetNum(Inputs.Num());
	for (int32 i = 0; i < Inputs.Num(); i++)
	{
		InputBindings[i].Data = (void*)Inputs[i].Data.GetData();
		InputBindings[i].SizeInBytes = Inputs[i].Data.Num() * sizeof(float);
		InputShapes[i] = FTensorShape::MakeFromSymbolic(FSymbolicTensorShape::Make(Inputs[i].Shape));
	}

	if (Model->SetInputTensorShapes(InputShapes) != 0)
	{
		UE_LOG(LogTemp, Error, TEXT("Failed to set the input shapes"));
		return false;
	}

	return true;
}

bool UNeuralNetworkObject::RunSync(UPARAM(ref) TArray<FNeuralNetworkTensor>& Outputs)
{
	check(Model.IsValid());

	using namespace UE::NNE;

	TConstArrayView<FTensorDesc> OutputDescs = Model->GetOutputTensorDescs();
	if (OutputDescs.Num() != Outputs.Num())
	{
		UE_LOG(LogTemp, Error, TEXT("Invalid number of output tensors provided"));
		return false;
	}

	TArray<FTensorBindingCPU> OutputBindings;
	OutputBindings.SetNum(Outputs.Num());
	for (int32 i = 0; i < Outputs.Num(); i++)
	{
		OutputBindings[i].Data = (void*)Outputs[i].Data.GetData();
		OutputBindings[i].SizeInBytes = Outputs[i].Data.Num() * sizeof(float);
	}

	return Model->RunSync(InputBindings, OutputBindings) == 0;
}

Cheers

ranierin · June 21, 2024, 9:03am

@TriggerhappyJTV I still recommend that you read through the 5.3 tutorial as there were quite some changes between 5.2 and 5.3. Also, there are some changes between 5.3 and 5.4 where we are still missing a tutorial.

The errors you get are basically exactly because of those changes: E.g. we return now shared pointers instead of unique pointers, thus the first two error messages.

Another good option is to look at the interface of NNE defined in Engine/Source/Runtime/NNE which will have all functions defined and comments how to call them.

Fingers crossed that you get things running!

dmerayopp · June 26, 2024, 5:48pm

Hi Nico,

We have onnx inference working at runtime in 5.3.2 but when migrating to 5.4.2 the engine does not compile anymore, as it seems that NNERuntimeORT does not support static configurations anymore and just runs on editor builds (the onnxruntime is now included in onnxEditor that is skipped for static builds and the exceptions that were guarded for editor builds in files like NNERuntimeORTModel.cpp are now unguarded). Is there any reason for this? I can make divergences to the engine and get it working again but I’d prefer to know the reasoning behind these changes.

dmerayopp · June 27, 2024, 8:25am

Thanks for the fix Nico, it was indeed that we were referencing the modules from the experimental plugin that they are not needed.

Zaratusa · June 27, 2024, 9:16am

Hi,

has anyone been able to get it running on Android? I’m using UE 5.4 the latest NNERuntimeORT from the ue5-main branch, which works fine on Windows. I’ve added the Android binaries from libonnxruntime release 1.14.1 and it gets included in the build, but it doesn’t seem to load the library as any call to library functions causes a crash.

Here is my UPL file:

<?xml version="1.0" encoding="utf-8"?>
<root xmlns:android="http://schemas.android.com/apk/res/android">

	<prebuildCopies>
		<!-- Ignore this mess, one these works -->
		<copyFile src="$S(AbsEngineDir)/Binaries/ThirdParty/Onnxruntime/Android/onnxruntime.aar" dst="$S(BuildDir)/gradle/app/libs/onnxruntime.aar" />
		<copyFile src="$S(PluginDir)/Binaries/ThirdParty/Onnxruntime/Android/onnxruntime.aar" dst="$S(BuildDir)/gradle/app/libs/onnxruntime.aar" />
		<copyFile src="$S(PluginDir)/../../../Binaries/ThirdParty/Onnxruntime/Android/onnxruntime.aar" dst="$S(BuildDir)/gradle/app/libs/onnxruntime.aar" />
	</prebuildCopies>

	<buildGradleAdditions>
		<insert>
			dependencies {
				implementation fileTree(dir: 'libs', include: ['*.aar'])
			}
		</insert>
	</buildGradleAdditions>

	<!-- ProGuard additions -->
	<proguardAdditions>
		<insert>
			-keep class ai.onnxruntime.** { *; }
		</insert>
	</proguardAdditions>

	<soLoadLibrary>
		<loadLibrary name="onnxruntime" failmsg="onnxruntime library not loaded and required!" />
	</soLoadLibrary>

</root>

Any idea why the library won’t be loaded or is it planned to include the mobile libraries in the future?

ranierin · June 28, 2024, 6:21am

Hi @Zaratusa
Unfortunately, we don’t support android yet, sorry!
It is definitively something we want eventually but don’t know when and which runtime will make it.
Apologies!

Mariusklux · June 29, 2024, 6:58pm

Hi Nico

I thank you so much for this great tutorial.

All the best

TriggerhappyJTV · June 30, 2024, 2:07am

Heya @ranierin,

I’ve been attempting to get things going in 5.4 referencing the 5.3 tutorial and the docs and have hit a new roadblock that doesn’t make sense to me. I’m being told that CreateModel and CreateModelInstance doesn’t exist for IModelCPU/INNERuntimeCPU which is weird as I’m using them the same way as the tutorial. (Also I’m not too familiar with C++ compared to other languages so am struggling with diagnosing the problem)

My code if it helps, “ModelInstance” is <IModelInstanceCPU> in the header file

#include "NeuralNetworkObject.h"

UNeuralNetworkObject* UNeuralNetworkObject::CreateModel(UObject* Parent, UNNEModelData* ModelData)
{
	using namespace UE::NNE;

	if (!ModelData)
	{
		UE_LOG(LogTemp, Error, TEXT("Invalid model data"));
		return nullptr;
	}

	TWeakInterfacePtr<INNERuntimeCPU> Runtime = GetRuntime<INNERuntimeCPU>(FString("NNERuntimeORTCpu"));
	if (!Runtime.IsValid())
	{
		UE_LOG(LogTemp, Error, TEXT("No CPU runtime '%s' found"));
		return nullptr;
	}

	TUniquePtr<IModelCPU> UniqueModel = Runtime->CreateModel(ModelData);
	if (!UniqueModel.IsValid())
	{
		UE_LOG(LogTemp, Error, TEXT("Could not create the CPU model"));
		return nullptr;
	}

	/*UNeuralNetworkObject* Result = NewObject<UNeuralNetworkObject>(Parent);	
	if (Result)
	{
		Result->Model = TSharedPtr<IModelCPU>(UniqueModel.Release());
		
		return Result;
	}*/

	TUniquePtr<IModelInstanceCPU> UniqueModelInstance = UniqueModel->CreateModelInstance();
	UNeuralNetworkObject* Result = NewObject<UNeuralNetworkObject>(Parent);
	if (Result)
	{
		Result->ModelInstance = TSharedPtr<IModelInstanceCPU>(UniqueModelInstance.Release());
		return Result;
	}

	return nullptr;
}

bool UNeuralNetworkObject::CreateTensor(TArray<int32> Shape, UPARAM(ref) FNeuralNetworkTensor& Tensor)
{
	if (Shape.Num() == 0)
	{
		return false;
	}

	int32 Volume = 1;
	for (int32 i = 0; i < Shape.Num(); i++)
	{
		if (Shape[i] < 1)
		{
			return false;
		}
		Volume *= Shape[i];
	}

	Tensor.Shape = Shape;
	Tensor.Data.SetNum(Volume);
	return true;
}

int32 UNeuralNetworkObject::NumInputs()
{
	check(ModelInstance.IsValid())
	return ModelInstance->GetInputTensorDescs().Num();
}

int32 UNeuralNetworkObject::NumOutputs()
{
	check(ModelInstance.IsValid())
	return ModelInstance->GetOutputTensorDescs().Num();
}

TArray<int32> UNeuralNetworkObject::GetInputShape(int32 Index)
{
	check(ModelInstance.IsValid())

	using namespace UE::NNE;

	TConstArrayView<FTensorDesc> Desc = ModelInstance->GetInputTensorDescs();
	if (Index < 0 || Index >= Desc.Num())
	{
		return TArray<int32>();
	}

	return TArray<int32>(Desc[Index].GetShape().GetData());
}

TArray<int32> UNeuralNetworkObject::GetOutputShape(int32 Index)
{
	check(ModelInstance.IsValid())

	using namespace UE::NNE;

	TConstArrayView<FTensorDesc> Desc = ModelInstance->GetOutputTensorDescs();
	if (Index < 0 || Index >= Desc.Num())
	{
		return TArray<int32>();
	}

	return TArray<int32>(Desc[Index].GetShape().GetData());
}

bool UNeuralNetworkObject::SetInputs(const TArray<FNeuralNetworkTensor>& Inputs)
{
	check(ModelInstance.IsValid())

	using namespace UE::NNE;

	InputBindings.Reset();
	InputShapes.Reset();

	TConstArrayView<FTensorDesc> InputDescs = ModelInstance->GetInputTensorDescs();
	if (InputDescs.Num() != Inputs.Num())
	{
		UE_LOG(LogTemp, Error, TEXT("Invalid number of input tensors provided"));
		return false;
	}

	InputBindings.SetNum(Inputs.Num());
	InputShapes.SetNum(Inputs.Num());
	for (int32 i = 0; i < Inputs.Num(); i++)
	{
		InputBindings[i].Data = (void*)Inputs[i].Data.GetData();
		InputBindings[i].SizeInBytes = Inputs[i].Data.Num() * sizeof(float);
		InputShapes[i] = FTensorShape::MakeFromSymbolic(FSymbolicTensorShape::Make(Inputs[i].Shape));
	}

	if (ModelInstance->SetInputTensorShapes(InputShapes) != 0)
	{
		UE_LOG(LogTemp, Error, TEXT("Failed to set the input shapes"));
		return false;
	}

	return true;
}

bool UNeuralNetworkObject::RunSync(UPARAM(ref) TArray<FNeuralNetworkTensor>& Outputs)
{
	check(ModelInstance.IsValid());

	using namespace UE::NNE;

	TConstArrayView<FTensorDesc> OutputDescs = ModelInstance->GetOutputTensorDescs();
	if (OutputDescs.Num() != Outputs.Num())
	{
		UE_LOG(LogTemp, Error, TEXT("Invalid number of output tensors provided"));
		return false;
	}

	TArray<FTensorBindingCPU> OutputBindings;
	OutputBindings.SetNum(Outputs.Num());
	for (int32 i = 0; i < Outputs.Num(); i++)
	{
		OutputBindings[i].Data = (void*)Outputs[i].Data.GetData();
		OutputBindings[i].SizeInBytes = Outputs[i].Data.Num() * sizeof(float);
	}

	return ModelInstance->RunSync(InputBindings, OutputBindings) == 0;
}

ranierin · July 19, 2024, 6:13am

Hey @TriggerhappyJTV ,

Sorry for the late reply, I have been on vacation.

We renamed those functions and added a suffix CPU, so CreateModelCPU and CreateModelInstanceCPU. You can find the public API in Engine/Source/Runtime/NNE/Public/NNERuntimeCPU.h
Also please find above a post from @mattai describing how to upgrade from 5.3 to 5.4

Good luck and happy coding

William9934 · July 19, 2024, 1:24pm

you are mostly correct:)

tracexCZE · July 19, 2024, 9:54pm

Hello,

I am experiencing an issue with the output data from a sentence similarity model from Hugging Face. The similarity scores appear to be random, and the embeddings for the same sentence change based on the sequence amount, which I believe shouldn’t happen.

I ran the same model in Python using the same ONNX model, and the values match up until the “runSync” function was called. Any suggestions would be greatly appreciated.

Thank you!

void ASSModelSelector::RunModel(TArray<FString> Inputs)
{

	//Filling TokenIDs and AttentionMask arrays there, IDs are 1:1 with python tokenizer.


	ModelHelper->InputData.Empty();


	UE::NNE::FSymbolicTensorShape SymbolicInputTensorShape = UE::NNE::FSymbolicTensorShape::Make({ LongestBatchSize, Inputs.Num() });
	TArray<UE::NNE::FTensorShape> InputTensorShapes = { UE::NNE::FTensorShape::MakeFromSymbolic(SymbolicInputTensorShape) };
	SymbolicInputTensorShape = UE::NNE::FSymbolicTensorShape::Make({ LongestBatchSize, Inputs.Num() });
	InputTensorShapes.Add(UE::NNE::FTensorShape::MakeFromSymbolic(SymbolicInputTensorShape));

	ModelHelper->ModelInstance->SetInputTensorShapes(InputTensorShapes);

	ModelHelper->InputData.SetNumZeroed(InputTensorShapes[0].Volume() + InputTensorShapes[1].Volume());
	ModelHelper->InputBindings.SetNumZeroed(2);
	ModelHelper->InputBindings[0].Data = &ModelHelper->InputData[0];
	ModelHelper->InputBindings[0].SizeInBytes = InputTensorShapes[0].Volume() * sizeof(int64);
	ModelHelper->InputBindings[1].Data = &ModelHelper->InputData[InputTensorShapes[0].Volume()];
	ModelHelper->InputBindings[1].SizeInBytes = InputTensorShapes[1].Volume() * sizeof(int64);

	UE::NNE::FSymbolicTensorShape SymbolicOutputTensorShape = UE::NNE::FSymbolicTensorShape::Make({ LongestBatchSize, Inputs.Num(), 384 });
	TArray<UE::NNE::FTensorShape> OutputTensorShapes = { UE::NNE::FTensorShape::MakeFromSymbolic(SymbolicOutputTensorShape) };
	SymbolicOutputTensorShape = UE::NNE::FSymbolicTensorShape::Make({ LongestBatchSize, 384});
	OutputTensorShapes.Add(UE::NNE::FTensorShape::MakeFromSymbolic(SymbolicOutputTensorShape));

	ModelHelper->OutputData.SetNumZeroed(OutputTensorShapes[0].Volume() + OutputTensorShapes[1].Volume());
	ModelHelper->OutputBindings.SetNumZeroed(2);
	ModelHelper->OutputBindings[0].Data = &ModelHelper->OutputData[0];
	ModelHelper->OutputBindings[0].SizeInBytes = OutputTensorShapes[0].Volume() * sizeof(float);
	ModelHelper->OutputBindings[1].Data = &ModelHelper->OutputData[OutputTensorShapes[0].Volume()];
	ModelHelper->OutputBindings[1].SizeInBytes = OutputTensorShapes[1].Volume() * sizeof(float);

	for (int i = 0; i < TokenIDs.Num(); ++i) {
		for (int j = 0; j < LongestBatchSize; ++j) {
			ModelHelper->InputData[i * LongestBatchSize + j] = TokenIDs[i].BatchIDs[j];
		}
	}

	for (int i = 0; i < AttentionMask.Num(); ++i) {
		ModelHelper->InputData[LongestBatchSize * TokenIDs.Num() + i] = AttentionMask[i];
	}

	int32 InputsAmount = Inputs.Num();

	ModelHelper->bIsRunning = true;
	TSharedPtr<FMyModelHelper> ModelHelperPtr = ModelHelper;
	AsyncTask(ENamedThreads::AnyNormalThreadNormalTask, [ModelHelperPtr, LongestBatchSize, InputsAmount, AttentionMask, EmbeddingsSize, this]()
		{
			if (ModelHelperPtr->ModelInstance->RunSync(ModelHelperPtr->InputBindings, ModelHelperPtr->OutputBindings) != 0)
			{
				UE_LOG(LogTemp, Error, TEXT("Failed to run the model"));
			}
	AsyncTask(ENamedThreads::GameThread, [ModelHelperPtr, LongestBatchSize, InputsAmount, AttentionMask, EmbeddingsSize, this]()
		{
			ModelHelperPtr->bIsRunning = false;

        checkf(0, TEXT("Incorrect output embeddings in ModelHelperPtr here"));

	float* OutputDataPtr = static_cast<float*>(ModelHelperPtr->OutputBindings[0].Data);

//mean pooling
	TArray<float> SequenceEmbedingSum;
	SequenceEmbedingSum.SetNumZeroed(InputsAmount * EmbeddingsSize);
	for (int i = 0; i < InputsAmount; ++i) {
		for (int j = 0; j < EmbeddingsSize; ++j) {
			for (int k = 0; k < LongestBatchSize; ++k) {
				SequenceEmbedingSum[i * EmbeddingsSize + j] += AttentionMask[k + i * LongestBatchSize] * OutputDataPtr[(k * EmbeddingsSize) + j + (i * EmbeddingsSize * LongestBatchSize)];
			}
		}
	}


	TArray<int64> maskSum;
	for (int i = 0; i < AttentionMask.Num() / LongestBatchSize; ++i) {
		maskSum.Add(0);
		for (int j = 0; j < LongestBatchSize; ++j) {
			maskSum[i] += AttentionMask[j + i * LongestBatchSize];
		}
		if (maskSum[i] < 1e-9)
			maskSum[i] = 1e-9;
	}

	for (int i = 0; i < maskSum.Num(); ++i) {
		for (int j = 0; j < EmbeddingsSize; ++j) {
			SequenceEmbedingSum[i] /= maskSum[i];
		}
	}

//2p normalization
	float norm = 0;
	float eps = 1e-12;
	for (int64 i = 0; i < InputsAmount; ++i) {
		norm = 0;
		for (int64 j = 0; j < EmbeddingsSize; ++j) {
			norm += pow(abs(SequenceEmbedingSum[i * EmbeddingsSize + j]), 2);
		}
		norm = pow(norm, 0.5);
		if (eps > norm)
			norm = eps;
		for (int64 j = 0; j < EmbeddingsSize; ++j) {
			SequenceEmbedingSum[i * EmbeddingsSize + j] /= norm;
		}
	}

//cosine similiarity
	TArray<float> Scores;
	Scores.SetNumZeroed(InputsAmount - 1);
	for (int64 i = 0; i < InputsAmount - 1; ++i) {
		double dot = 0.0, denom_a = 0.0, denom_b = 0.0;
		for (int64 j = 0; j < EmbeddingsSize; ++j) {
			dot += SequenceEmbedingSum[j] * SequenceEmbedingSum[j + (i + 1) * EmbeddingsSize];
			denom_a += SequenceEmbedingSum[j] * SequenceEmbedingSum[j];
			denom_b += SequenceEmbedingSum[j + (i + 1) * EmbeddingsSize] * SequenceEmbedingSum[j + (i + 1) * EmbeddingsSize];
		}
		Scores[i] = dot / (sqrt(denom_a) * sqrt(denom_b));
	}

	for (int64 i = 0; i < Scores.Num(); ++i) {
		UE_LOG(LogTemp, Error, TEXT("Score: %f"), Scores[i]);
		this->Score.Add(Scores[i]);
	}

			OnScoreUpdated.Broadcast();

mattai · July 20, 2024, 12:35am

It’s really hard to guess by just looking at this snippet, but here are a few things I would look at if I had the project:

run in sync instead of inside an async task and check if the data is more consistent.
check your inputs/outputs and maybe simplify the code:

    Outputs.SetNum(1);
    Outputs[0].Shape = { NumFrames, NumDepth };
    Outputs[0].Data.SetNum(NumFrames * NumDepth);

    check(Model);
    Model->SetInputs(Inputs); // needed so the input bindings are properly set

    // run the model
    Model->RunSync(Outputs);

you should be able to inspect the output of your model (Outputs[0].Data in my case) and see if it makes sense or not.

Sorry I can’t help more.

tracexCZE · July 20, 2024, 2:39pm

Hello,

Thank you for your reply. Running the model in sync/async mode didn’t impact the embeddings. Although I still haven’t pinpointed the exact problem, I have figured out why it is happening.

For people who might encounter the same issue: the model I’m using has two output tensors. The second tensor has the shape [batch size, embedding size], whereas in the transformer version, it is [sequence size, embedding size]. I tried to manually adjust the parameter to match the transformer version, but the number of output elements didn’t match what the model generated.

What I did notice is that if the batch size is equal to the sequence size, the scores have only about a 3% deviation. This deviation could be partially caused by slight differences in the implementation of Python functions like cosine similarity. However, the embeddings are still different from what they should be compared to the ONNX model run in Python.

I don’t know what is causing this problem, but for my use, having the same batch and sequence size to get the correct score is good enough.

Thanks again for your help!