Hi! We’re in the process of upgrading to 5.6.1. We run our servers on Linux using Kubernetes. On 5.5.4 we had satisfactory performance, we were able to target 30 fps. On 5.6.1 we’re seeing crippling performance drop, down to 7-8 fps.
We inspected 5.6 and 5.5 traces of servers running on Linux and noticed that the culprit might be OS randomly taking quants from process or threads. There are multiple 30-40 ms freezes in random execution blocks, such as

I cannot show you full trace unfortunately, as it contains sensitive information. Screenshots above show physics exclusively, but there are random freezes inside ISM transform updates and our code. Physics just takes most of server CPU time, which is probably why it’s more frequent.
5.5 version and 5.6 version are based on pretty much the same codebase, save for some changes to adopt 5.6 API changes. The worker node hardware configuration and startup options are the same. We tried elevating process priority, but that did not help.
We saw that 5.6 introduced Tasks instead of ParallelFor for MidPhase, which looks like the only major change inside Chaos. This change actually made multithreading work on Linux servers. Thanks to that, when we re-enabled midphase redistribution, we saw significant gains in CPU performance (up to 3x), but those 30-40 ms spikes are still present.
Another side effect is that physics uses frame duration for substepping. Those 30-40 spikes create a positve feedback loop. They extend frame duration and cause substepping make more steps, which worsens server CPU time even more.
In short:
- Did team change anything related to thread/process scheduling in 5.6? What can we check in our OS/VM configuration?
- Seeing how midphase redistribution is really beneficial (and is actually working on Linux multithreaded configurations now), why did team decide to disable it by default? It was enabled in 5.5. This makes upgrades much harder as we have to track whether any good stuff was disabled.
Here I will post my findings. Maybe someone will benefit from them.
- Did team change anything related to thread/process scheduling in 5.6? What can we check in our OS/VM configuration?
Here the problem was that I took traces using two different situations. I used
taskset -C
to start the server and pin it to exact core, while Kuber, it seems, uses something like
systemd-run
to start the server and allocate it some CPU time. This method causes a lot of performance drops as OS takes execution quant from the server midframe and probably shifts the process between cores.
Next,
Seeing how midphase redistribution is really beneficial (and is actually working on Linux multithreaded configurations now), why did team decide to disable it by default? It was enabled in 5.5. This makes upgrades much harder as we have to track whether any good stuff was disabled.
Our findings were much more complex than that.
First, the new threading system has some issues with priorities and task execution order.
Collisions::BroadPhase::ParticlePair and Collisions::AssignMidPhases are both started as High priority tasks. However, High priority actually means Normal priority on a high priority thread.
The second problem is that FChaosAccelerationStructureTask is a Normal priority task on a low priority thread. I initially thought that the issues was that all tasks are actually Normal priority, but then I realised that there is a dynamic prioritisation mechanism in tasks, where low priority threads could raise their priority, grab one task from high priority thread, execute it and lower thread priority back. In this case here there is no benefit to do that, and the only way to win is to start ParticlePair and AssignMidPhases thread priority BackgroundHigh.
In 5.5 ParallelFor did not actually parallel anything and it was easy to force it to a single core anyways. Now, it’s impossible to do it cleanly.
True, the server runs with a limit on CPU time of 1 core and we’d probably see much better CPU times if just allocated four cores to the server. But four cores means four times less servers per VM. Single core performance was good enough, and now forced multithreading makes it worse by unneccessarily distributing work across threads.
Now, continuing to Chaos itself. There we found a couple of issues.
First,
FContactPairModifier::GetWorldContactLocations(int32 ContactPointIdx, FVec3& OutLocation0, FVec3& OutLocation1) const
The method is const, however it marks contacts as dirty. As of 5.6, this invalidates manifold cache. This significantly dropped performance for our stationary convex/landscape objects. We call GetWorldContactLocations pretty much every frame.
Second, memory deallocaton performance inside Collisions::UpdateConvexHeightFieldConstraint, even on 5.5.4.
We noticed that constraint update spent most of its time on destroying FMeshContactGenerator. It contains hash maps and arrays of trivially destructible data. On the screenshot above, Triangles array has length 0, yet reserves 500k bytes of memory and it takes about 0.5 ms just to release it. This points to memory allocator, and after changing Linux default allocator from Binned2 to Mimalloc, we got about 6x performance improvement for FMeshContactGenerator destruction.
Like that:
However, it’s still about 60% time spent only on deallocating empty arrays.
What we ended up doing is putting FMeshContactGenerator inside FCollisionContext and reusing it.
This massively speeds up the FPBDRigidsEvolutionGBF::AdvanceOneTimeStep and turned this