Course: Neural Network Engine (NNE)

ranierin · September 11, 2025, 8:59am

Please make sure to post the relevant part of your error log the next time which makes it easier to understand what went wrong.

I would start with splitting shape information and data, it seems like you put first the shapes into the data, use it to create tensor shapes and then overwrite the data. I would also make sure that your shape are ints, to avoid any trouble with casting and make the code easier to read.

If you know the output dimensions upfront, you do not need to get the output tensor descs, you can just allocate enough memory and be fine with it.

Then I would add tests to all NNE function to see if they return success.

Finally: Careful with those casts and prints, maybe just comment that code and see if it runs.

Good luck!

dongesetest · September 21, 2025, 2:08pm

Hi @ranierin , I’m new to Unreal Engine’s Neural Network Engine (NNE), and I’ve been stuck on a problem for a few weeks. I’m trying to use INNERuntime to run a YOLO model in Game Mode.
At first, to get familiar with the system, I used INNERuntimeCPU and got it working very smoothly. However, the performance is poor—only 6–7 FPS—so I decided to switch to INNERuntimeGPU.
But I noticed your note: “Due to CPU/GPU sync, this is not very performant and thus only available in the Editor (e.g., if you want to run ML asset actions).”
Therefore, I moved on to INNERuntimeRDG, yet I’m worried that RDG supports only a few operators, which could cost me a lot of time.
So I tried to bypass INNERuntime altogether and use ONNX Runtime directly. Unfortunately, this attempt failed: first I hit an ONNX version conflict (UE 5.5 ships with ONNX Runtime 1.17.1), and even after aligning to the exact same ONNX Runtime version, the program still crashes.

I’d like your advice:

If I don’t care about performance, is it acceptable to use INNERuntimeGPU in Game Mode?
If I bypass INNERuntime in UE 5.5 and call ONNX Runtime directly, is that feasible?

ranierin · September 26, 2025, 7:56am

HI @dongesetest ,

Yes, you can totally use INNERuntimeGPU in game mode, besides the performance drop, there should be no other implications. And if performance becomes a bottleneck, you can try to switch to INNERuntimeRDG, enqueue the model in one frame and get the results in the next frame. This would introduce one frame latency but bring back full performance if you run a game at the side.

You did not mention which backend you are using. I would recommend using NNERuntimeORTCpu on cpu, NNERuntimeORTDml for gpu or rdg. ORT has the widest model support so you should not run into any issues. Both ORTCpu and ORTDml have almost the same model support, so I would not worry too much about operator support.

The problem with ORTDml is that it only runs on DirectX devices. If that is a problem, you should try if you run it through our NNERuntimeRDGHlsl. This runtime does have narrower support for operators, so I would make sure to export/convert your models to different Opset for trying (e.g. 18 and 21).

You can try to bypass NNE and use ORT directly, however, there could be a conflict with ORT that is already in the engine, so you may need to disable the ORT plugin which is enabled by default.

Also, when you call ORT directly, you basically redo the work we did in NNE so I am not sure if you would get a real advantage out of it.

I hope this info helped, good luck!

patso85 · September 30, 2025, 7:56pm

We spent a lot of time in this release on NNERuntimeIREE. However, it is still work in progress and needs some expertise on how to adapt the model to get it running. But it shows great perfromance on CPU for small real time models due to it’s low overhead compared to other runtimes.

I’ve tried the NNERuntimeIREE runtime, but I got the following error:

LogNNERuntimeIREE: Error: CPU session: Input bindings memory need to be aligned with 64 bytes

I’ve compiled the model to mlir format. The inputs are float64, so I am not quite sure what the alignment error refers to.

pauciloquent98 · October 4, 2025, 8:49pm

Thank you so much for the response!

The error I get with the code above is a full on crash. Looks like it’s crashing when I actually try to run the model:

LoginId:7503dafd456c2a17460c03b6c283ec7c
EpicAccountId:574d08e98bbf4916afd4b598125b3451

Unhandled Exception: EXCEPTION_ACCESS_VIOLATION reading address 0x0000000000000000

UnrealEditor_ShadowShapes_CNN!AANNEE::Tick'::21’::<lambda_1>::operator()() [C:\Users\Nicole\Desktop\ShadowShapes_CNN\Source\ShadowShapes_CNN\ANNEE.cpp:137]
UnrealEditor_Core
UnrealEditor_Core
UnrealEditor_Core
UnrealEditor_Core
UnrealEditor_Core
UnrealEditor_Core
UnrealEditor_Core
UnrealEditor_Core
UnrealEditor_Core
UnrealEditor_Core
UnrealEditor_Core
kernel32
ntdll

I’ve tried off and on in the past few weeks to follow what you’re saying. I understand loosely/conceptually what you mean but it’s still a bit over my head even to implement

Would you possibly know someone I could reach out to (or another forum) who could help me out with my project a bit more in-depth? Happy to pay for time if needed as I do really want to learn how to do this! For whatever reason the differences in implementation here versus what I’ve done in Pytorch aren’t clicking for me (the gaps in my C++ knowledge probably aren’t helping either). I’m not even sure where to start trying to bridge these gaps on my own to get the code to do what I’d like it to

ranierin · October 8, 2025, 3:55pm

Hey @patso85 ,
IREE requires your buffers to be aligned to be very efficient when feeding data into ALU. All this means, is that the address of your first element must be divisible by 64.
If you work with TArray you can just use TAlignedHeapAllocator and pass 64 in the template argument. If you use the FMemory interface you can pass the alignment to the Malloc function.

You need to do this for each input, so each input binding pointer needs to be divisible by 64.

With this you will be able to get past the error you posted

ranierin · October 8, 2025, 4:01pm

Hi @pauciloquent98 ,

This looks like a classic pointer issue. C++ is a bit more tricky to understand copy vs reference, something that is handled automatically by Python.

I recommend looking for some online tutorials that explain how C++ pointers work, what it means to pass something by copy and something by value. Without this understanding you will constantly run into access violations and other almost unexplainable errors caused by corrupt memory.

So I would just do some c++ tutorials to deepen your understanding or find someone to mentor you on it based on your specific code.

Sorry for not being able to help here

scropions · October 22, 2025, 12:36pm

Dear developers, please provide an example of connecting models like Yolo. This problem affects many, and everyone encounters it. I even created a thread about it.

yolov8 and Neural Networks for begginers - Development / Programming & Scripting - Epic Developer Community Forums

Dead_Morose · October 30, 2025, 4:47am

I made a little example. It is realtime neural style transfer sample forked from old Microsoft repo updated to Unreal Engine 5.5 and the Neural Network Engine (NNE) Render Dependency Graph (RDG) runtime.