NNERuntimeORTCpu inference is much faster in Editor than Standalone

anonymous-edc · April 23, 2025, 12:38pm

Hello,

We run NNERuntimeORTCpu inference on several models at runtime (PC) and we are seeing that when running using the Editor binaries (either PIE or -game) it runs around 4x faster than running using Standalone binaries, even when comparing DebugGame Editor vs Shipping Standalone (~12 seconds vs ~45 seconds).

Looking at a trace FModelInstanceORTBase::RunSync seems to be much more expensive in Standalone builds despite the number of calls being the same (1075 in our case). Have you seen this in your end? or do you have any idea what can be the issue? I rather ask before spending time profiling engine plugins.

We are on 5.5.4

Specs: AMD Ryzen 9 7950X3D 64Gb w/ NVME

Thanks,

Diego

ranierin7 · April 24, 2025, 7:42am

Hi Diego,

NNERuntimeORTCpu is configured in a way that it uses as many threads as it needs in the editor but runs single threaded in standalone to give more control over CPU budgeting.

So in your case, your model can probably be parallelized to 4 threads, so it runs 4x faster in editor (but using 4 cores instead of one!).

If you are fine with NNERuntimeORTCpu using as many cores as it can get in standalone, you can change the behaviour in the plugin settings.

Just go to your project settings, plugins and NNERuntimeORT and match the settings of standalone to what is there in Editor.

Please note: there is a slight overhead when using multiple threads due to thread pool operations.

So going for multithreaded really just only decreases your latency, the overall compute time is at least as high as in single threaded mode.

I hope that helps!

Best

Nico

anonymous-edc · April 24, 2025, 8:36am

Hi Nico,

That’s exactly it, removing the thread limit returns the same timings as in the Editor binaries, we will look into what values work best for our case.

Thank you very much,

Diego