Hello. We noticed that in version UE5.4 you dropped support (for some platforms) for ONNX models in the NNERuntimeIREE module in favor of MLIR models. During our migration to MLIR, we encountered a rather complex build pipeline. Unreal supports a specific version of MLIR and expects models in that format, but what is the easiest way to convert a pre-trained model into this format?
There are several options for obtaining an MLIR model of the appropriate version:
- Use torch-mlir and save the model via torch_mlir.torchscript.compile()
- Use ONNX and convert it using onnx-mlir
- Use ONNX and convert it using iree-import-onnx
Thanks to the TPS files at https://github.com/EpicGames/UnrealEngine/tree/5.5/Engine/Plugins/Experimental/NNERuntimeIREE/Source/ThirdParty/IREE, I was able to find the source code for torch-mlir and IREE with the exact commit.
Option 1 is difficult to build, as there is no easy way to get a compatible version of torch-mlir. There have been no releases on PyPI since 2022. Meanwhile, UE5.5 uses the version from https://github.com/shark-infra/torch-mlir/tree/1f597a8fa0e0c2bc32f77bd349688998d4992f27 — April 2024. Torch-mlir also has daily snapshots, but there haven’t been any since January 2024. The build process for torch-mlir is quite complex and poorly documented.
Option 2 is the most difficult, as it’s unclear which version of onnx-mlir is compatible with UE5. I haven’t even attempted it — it seems nearly impossible.
Option 3 is the one I’m currently trying. I’m building the IREE project from this version: https://github.com/iree-org/iree/tree/b4273a4bfc66ba6dd8f62f6483d74d42a7b936f1, hoping that afterward I’ll be able to use iree-import-onnx to convert ONNX to MLIR. This option also has its difficulties with submodules (I had to manually remove and re-add third_party/benchmark). Additionally, I had to determine which Python version to use — it appears that versions above 3.10 are not supported. I also had to figure out which build tools were required.
I’ve made the most progress with building IREE, and I might succeed, but the process feels overly complicated, and I can’t shake the feeling that there must be an easier way — setting up this environment every time we retrain a model is quite a challenge.
So, my question is: do you have any recommendations for preparing data for NNERuntimeIREE?
Best regards,
Edward
Hi Edward,
Unfortunately we never had support for ONNX inside our IREE runtime. The only way to feed NNERuntimeIREE is to use a valid MLIR dialect.
First regarding option 2: To my knowledge the dialect produced with onnx-mlir is not compatible with IREE; it is internal to onnx-mlir and used to lower to machine code with the same.
Option 1. and 2. are both valid.
For option 1 I would recommend using https://github.com/iree-org/iree-turbineiree-turbine
As you discovered already correctly, with both options, you will need to match the versions of each tool you use to the version we have in the engine (you correctly identified it through the TPS file).
The reason behind it is that MLIR is an intermediate representation (IR) without versioning and subject to change.
While this is all work in progress and currently cumbersome (even if you match all versions of all tools, you still may be unable to lower a specific model due to missing compiler/lowering/rewrite rules).
We are considering various alternatives and keep an open eye to knew upcoming features that would simplify this process but cannot give an ETA at the moment.
As for know, I recommend iree-turbine to export your model and just make sure its version is as close or just below the IREE version we have in UE.
Apologies for the inconvenience: MLIR based model compilation is still work in progress but I am confident it will mature fast in the future and we will be able to offer a more convenient integration inside UE.
Best
Nico
Hi, thanks for your reply.
In the end, I used iree-import-onnx from the third option. It generally worked, but there are issues with input alignment on some platforms. My colleague reported problems related to this in another thread — [Content removed]
Hmm, this sounds odd: IREE compiles models down to executable code making use of specific hardware instructions like vector registers etc.
The performance of small MLP was actually pretty good as these have already been optimized by the compiler community. I can imagine that the bad performance you observed could have been introduced by going through the onnx importer.
We did some tests while ago, comparing BasicCpu, IREE and ORT. While ORT was a factor five slower on small mlps, IREE could almsot match BasicCpu.
So while I have the feeling something is off and I would try IREE again but directly export the model to MLIR (trying out different dialects like linalg, tosa stablehlo), BasicCpu could be a valid alternative as it would still give you a slight performance boost. And it does use ISPC kernels and runs to my knowledge on current gen consoles. But it has a custom input format, so you would have to manually write out the weights and the graph information (Both MLDeformer and LearningAgents contain some reference code how to do so).
Note, an advantage of NNE over custom implementations is that it generalizes well to other runtimes nad/or hardware. E.g. it would just take a single line of code to change running on an NPU on target devices that have one, freeing some CPU resources. But not sure if this applies to your use case.
Hope that helps!
Yes, unfortunately the MLIR export chain is still not very userfriendly as it is an intermediate representations.
Note, we dont yet have any runtime that that exposes cloud specific NPU and not sure if and when we can put it on our roadmap (but it would certanly be possible to be done).
Keeping my fingers crossed that you will succeed!
Hello. In the end, we were not satisfied with the performance of the MLIR runtime. We have a fairly simple model (3 linear layers) and we are considering implementing inference ourselves. However, I came across NNERuntimeBasicCpu in the source code, which at first glance seems perfect for our needs. Could you tell me if you have compared the performance of MLIR/ONNX against NNERuntimeBasicCpu?
Additionally, I see that the runtime supports INTEL_ISPC. Is this optimization supported on current generation consoles?
As I understand it, to enable it, it should be enough to set
bCompileISPC = true
in Game.Target.cs, correct?
Best regards,
Edward
Thank you for the information. I will look into whether we have devices with NPUs — this could really give us a boost, although the most problematic platforms are consoles and cloud instances for dedicated servers (in single-threaded mode). The user format is not a big issue and we should be able to implement it. However, I will still explore the possibility of exporting the model to MLIR from Torch. Last time I looked into this, I had trouble finding a suitable torch-mlir version, and compiling such projects did not always go smoothly. Thanks again for the information — we’ll keep experimenting.