ARM Performance considerably slower

Hi there,

we are currently trying to migrate our dedicated servers infrastructure from x86 to ARM but, during our performance testing, we have seen that ARM is 30% slower on the CPU even if the x86 and arm machines should have been compatible in terms of performance.

Taking a look to the insight trace captures, it seems to be mainly related to GameNetDriver.Tick that, on x86, seems to be “constant”, while on ARM it has huge spikes with a duration many times over the “normal” one.

These spikes seem to be located near an outgoing replication of struct which contains only a byte buffer (200 bytes approx.) that we use to transfer some physics’ prediction for each pawn; this structrure, due to game logics, has quite a high replication frequency which obviously impacts the overall performance.

Is there any known issue with ARM performance with UE replication? Is this caused by our buffer replication that is not optimized on ARM?

Do you have any suggestion on how to profile this further?

Regards,

Fabio Segantin

Hi,

I haven’t been able to find any known issues around ARM performance with replication.

One possible explanation for the performance spike is core contention or the server process being locked out of the core. This can occur when kernel space is hit, with memory allocations being a common place this occurs. The engine’s memory allocators have mutexes when they hit kernel space, so if your servers are running multithreaded, it’s also possible that this can cause contention with other threads that are allocating memory.

It’s worth checking if more memory allocations are occurring in this struct’s serialization path, as removing these allocations may help. It’s also worth checking that the bin malloc being used is pre-allocating enough memory buffers.

If this isn’t the case, would you be able to share more specifics on your setup? This could help provide more insight into the problem here, and this question can can be made private if needed.

Thanks,

Alex