Double invalidation on FPhysScene_Chaos::OnStartFrame leads to extra contention in the PhysicsParallelFor

Hi,

My familiarity with Chaos code is lacking, so if anything seems wrong or need more context, please don’t hesitate.

Recently, in a scene we were working on, there was a large number of skeletalmesh being registered and unregistered over multiple frames. During some subsequent exploration of performance, we discovered that FPhysScene_Chaos::OnStartFrame could take as much as 5ms to complete. Diving deeper into the Superluminal trace associated with this, we discovered that most of the time was spent in Chaos::FChaosMarshallingManager::AddDirtyProxy (from Chaos::FRigidBodyHandle_External::SetR in the Chaos::PhysicsParallelFor). And most of the time spent was purely wait time.

From there, we looked at the invalidation pattern around the SkeletalMeshComponent in FPhysScene_Chaos::UpdateKinematicsOnDeferredSkelMeshes and it seemed that an invalidation of the proxies was already done for all of them before the PhysicsParallelFor (see ProxiesToDirty). From that point, we looked at the AddDirtyProxy and realized that it would do nothing if the dirty index was already set. We implied that this meant that, considering everything was invalidated previously, no work would be done there. We also implied that since the Proxy is passed in as a parameter, it can be looked at outside the scope lock on MarshallingManagerLock. From there, by extracting the check outside the write lock, we were able to shave a couple of milliseconds (2-3ms in our local tests) per OnStartFrame.

Sadly, I am unable at this time to provide a repro of this issue. I included two screenshots showing the results we are seeing locally.

Hope this will help improve performance of the Chaos frame!

[Attachment Removed]

Hi Maxime,

Thank you for the report - is it possible you could upload the superluminal trace with the engine side stack frames?

Best

Geoff Stacey

Developer Relations

EPIC Games

[Attachment Removed]

Hi Maxime, I’ve taken a look at the code now and what you say makes complete sense.

I’ll get this added to a CL and put it up for review.

Best

Geoff

[Attachment Removed]

Hi Geoff,

I am so sorry, it is difficult for me to generate and provide a superluminal trace in a public setting. Just to be clear, in the screenshot I provided, there is “nothing” below AddDirtyProxy, only unresolved symbols. Reproducing the issue outside of our setup has also been a challenge since I don’t know exactly which part is causing the lock interaction to misbehave. In terms of number of skeletal mesh, this seems to require a small amount of skeletel mesh; in our automation test, we have ~60 skeletal mesh marked for deferred kinematic update.

Internally, we have automation tests where we A/B tested the single change mentioned above with great results on PS5 (up to 2+ms per frame gain, less when dealing with recent PC as a target). On PC, it shows as 3% of the overall time while on PS5, it shows as much as 60%.

[Image Removed]

[Attachment Removed]