Hi,
I haven’t been able to find any known issues around ARM performance with replication.
One possible explanation for the performance spike is core contention or the server process being locked out of the core. This can occur when kernel space is hit, with memory allocations being a common place this occurs. The engine’s memory allocators have mutexes when they hit kernel space, so if your servers are running multithreaded, it’s also possible that this can cause contention with other threads that are allocating memory.
It’s worth checking if more memory allocations are occurring in this struct’s serialization path, as removing these allocations may help. It’s also worth checking that the bin malloc being used is pre-allocating enough memory buffers.
If this isn’t the case, would you be able to share more specifics on your setup? This could help provide more insight into the problem here, and this question can can be made private if needed.
Thanks,
Alex