Crash in FCollisionConstraintAllocator Part 2

Continuing discussion from this thread:

[Content removed]

And calling [mention removed]​

The crash returned, but I think I may have finally figured it out.

To recap, FCollisionConstraintAllocator has an array, ActiveConstraints, that it iterates, visiting any constraints that it has pointers to. Very occasionally, for us, it crashes because it visits a constraint that has been deleted and overwritten with the freed memory fill pattern 0xDD.

I saw a constraint was destroyed that still had an index in its ContainerCookie.ConstraintIndex, indicating it was still emplaced in ActiveConstraints. But this was in FCollisionConstraintAllocator::RemoveParticle(). RemoveActiveConstraint() should have cleaned this up before the constraint’s destruction. How is this possible?

Here is my prospective fix. We need to pass ECollisionVisitorFlags::VisitAllCurrentAndExpired, because the default flags will not visit everything. If there are any disabled or expired constraints, by default they won’t be visited, and won’t be removed from ActiveConstraints.

void FCollisionConstraintAllocator::RemoveParticle(FGeometryParticleHandle* Particle)
{
	// Removal not supported during the (parallel) collision detection phase
	check(!bInCollisionDetectionPhase);
 
	// Loop over all particle pairs involving this particle and tell each MidPhase 
	// that one of its particles is gone. It will get pruned at the next collision detection phase.
	// Also remove its collisions from the Active lists.
	Particle->ParticleCollisions().VisitMidPhases([this, Particle](FParticlePairMidPhase& MidPhase)
		{
			MidPhase.VisitCollisions([this](FPBDCollisionConstraint& Constraint)
				{
					RemoveActiveConstraint(Constraint);
					return ECollisionVisitorResult::Continue;
				}, ECollisionVisitorFlags::VisitAllCurrentAndExpired);	// TRS CHANGE: visit EVERYTHING because next we'll delete everything
 
			MidPhase.DetachParticle(Particle);

[Attachment Removed]

Hello [mention removed]​

Thank you for reaching out and tagging me.

I’m glad you found a solution on your end.

I’ll check it against the “stock” engine version.

If the issue is present, I’ll open an issue report and propose yours as a solution.

All the best,

[mention removed]​

[Attachment Removed]

Hi Rafael, I’m about to give you another one of my unsatisfying answers. :slight_smile:

Regarding my old thread, I once mentioned one of our content creators had removed extraneous bodies from the physics setup of the character that was most often associated with the crash. I think the relevance of that change has been discredited, since the crash returned later.

I looked at your linked issue, but it doesn’t strike me as related.

And, most unsatisfying, I can’t describe our scene structure. I am involved with this issue only as a generalist crash investigator. I’m very much in the dark as to where our physics constraints come from. Chaos, like many systems, works on anonymized data that (for me, at least) is difficult to trace back to a higher level and game assets we’ve created.

But - I can still try to justify my change to you. Here goes.

In that callstack you posted, you are iterating over the ActiveConstraints array of pointers to constraints. You call BeginTick()/IsSleeping() on each pointer that is not null. Now, imagine if the memory indicated by one of those pointers has been freed, and in a Development build with the FMallocPoisonProxy in place, the memory has been filled with the bit pattern 0xDD. You will crash. (The poison proxy isn’t used in editor, so you can’t reproduce the issue there.)

Now how could we get a stale pointer in the array like that? I assert it is possible if, when FCollisionConstraintAllocator::RemoveParticle() is called, constraints exist that are “disabled” or “expired”.

I can’t tell you how to get constraints that are disabled or expired. I don’t know what the terms mean here. But that doesn’t really matter. Let’s focus on what RemoveParticle() is trying to do.

The point of RemoveParticle() is to call RemoveActiveConstraint() on each constraint. It has to hit each and every one, because any constraint it does not visit will be left in ActiveConstraints.

Take a look at FPBDRigidsEvolutionGBF::DestroyParticleCollisionsInAllocator(). After calling RemoveParticle(), it will call Particle->ParticleCollisions().Reset(). That will destroy all constraints. If any were not visited in RemoveParticle(), then our array will now have stale pointers to them.

Going back to RemoveParticle(). It relies on MidPhase.VisitCollisions() to iterate over the constraints. That takes a default parameter… VisitFlags = ECollisionVisitorFlags::VisitDefault… which means it will not visit constraints that are disabled or expired. If any exist, they will result in stale pointers once the constraints are destroyed.

That’s what had to have happened to us. I added this line to ~FPBDCollisionConstraint()

check(ContainerCookie.ConstraintIndex == INDEX_NONE);…and the check failed, right after RemoveParticle() had been called. That shows we had a constraint that was not visited.

We need to pass ECollisionVisitorFlags::VisitAllCurrentAndExpired because any constraint that is not visited before deletion is a crash.

[Attachment Removed]

Hi Steve, and apologies for the extremely slow response to this.

I think you are right, but this is a vary rarely used path in the codebase which is why we haven’t stumbled across it before. (Ie disabling a constraint and then deleting a body later).

Can I find out the use case you were running into this with please? I’m curious about how our clients are using these features?

Secondly have you found any side effects?

Best

Geoff

[Attachment Removed]

Appreciated Steve :slight_smile:

We have submitted a fix for this which is a little bit different, but essentially solves the problem in a same manner. The CL is 49810206, but maybe easiest thing is just to try reverting your fix is you take 5.8 at some point after release.

All the best

Geoff

[Attachment Removed]

Hello [mention removed]​

I’m yet to reproduce the issue.

Could you guide me on the scene structure, what you were doing the moment you had the crash, anything that might help me reproduce it?

I was able to reach the same call stack you shared by destroying a PhysicsConstraint during runtime, but no crashes so far:

>	UnrealEditor-Chaos.dll!Chaos::FPBDCollisionConstraint::IsSleeping() Line 533	C++
 	[Inline Frame] UnrealEditor-Chaos.dll!Chaos::FPBDCollisionConstraint::BeginTick() Line 387	C++
 	UnrealEditor-Chaos.dll!Chaos::Private::FCollisionConstraintAllocator::ResetActiveConstraints() Line 171	C++
 	[Inline Frame] UnrealEditor-Chaos.dll!Chaos::Private::FCollisionConstraintAllocator::BeginFrame() Line 349	C++
 	UnrealEditor-Chaos.dll!Chaos::FPBDCollisionConstraints::BeginFrame() Line 549	C++
 	UnrealEditor-Chaos.dll!Chaos::AdvanceOneTimeStepTask::DoWork() Line 521	C++
 	UnrealEditor-Chaos.dll!Chaos::FPBDRigidsSolver::AdvanceSolverBy(const Chaos::FSubStepInfo & SubStepInfo) Line 1330	C++
 	UnrealEditor-Chaos.dll!Chaos::FPhysicsSolverAdvanceTask::AdvanceSolver() Line 176	C++
 	UnrealEditor-Chaos.dll!Chaos::FPhysicsSolverAdvanceTask::DoTask(ENamedThreads::Type CurrentThread, const TRefCountPtr<FBaseGraphTask> & MyCompletionGraphEvent) Line 152	C++
 	UnrealEditor-Chaos.dll!TGraphTask<Chaos::FPhysicsSolverAdvanceTask>::ExecuteTask() Line 634	C++
 	UnrealEditor-Chaos.dll!UE::Tasks::Private::FTaskBase::TryExecuteTask() Line 504	C++
 	[Inline Frame] UnrealEditor-Chaos.dll!UE::Tasks::Private::FTaskBase::Init::__l2::<lambda_1>::operator()() Line 185	C++
 	[Inline Frame] UnrealEditor-Chaos.dll!LowLevelTasks::FTask::Init::__l13::<lambda_1>::operator()(const bool) Line 499	C++
 	[Inline Frame] UnrealEditor-Chaos.dll!Invoke(LowLevelTasks::FTask::Init::__l13::<lambda_1> &) Line 47	C++
 	[Inline Frame] UnrealEditor-Chaos.dll!LowLevelTasks::TTaskDelegate<LowLevelTasks::FTask * __cdecl(bool),48>::TTaskDelegateImpl<`LowLevelTasks::FTask::Init<`UE::Tasks::Private::FTaskBase::Init'::`2'::<lambda_1>>'::`13'::<lambda_1>,0>::Call(void *) Line 162	C++
 	UnrealEditor-Chaos.dll!LowLevelTasks::TTaskDelegate<LowLevelTasks::FTask * __cdecl(bool),48>::TTaskDelegateImpl<`LowLevelTasks::FTask::Init<`UE::Tasks::Private::FTaskBase::Init'::`2'::<lambda_1>>'::`13'::<lambda_1>,0>::CallAndMove(LowLevelTasks::TTaskDelegate<LowLevelTasks::FTask * __cdecl(bool),48> & Destination, void * InlineData, unsigned int DestInlineSize, bool <Params_0>) Line 171	C++
 	[Inline Frame] UnrealEditor-Core.dll!LowLevelTasks::TTaskDelegate<LowLevelTasks::FTask * __cdecl(bool),48>::CallAndMove(LowLevelTasks::TTaskDelegate<LowLevelTasks::FTask * __cdecl(bool),48> &) Line 308	C++
 	UnrealEditor-Core.dll!LowLevelTasks::FTask::ExecuteTask() Line 627	C++
 	UnrealEditor-Core.dll!LowLevelTasks::FScheduler::ExecuteTask(LowLevelTasks::FTask * InTask) Line 245	C++
 	[Inline Frame] UnrealEditor-Core.dll!LowLevelTasks::FScheduler::TryExecuteTaskFrom(LowLevelTasks::Private::FWaitEvent *) Line 457	C++
 	UnrealEditor-Core.dll!LowLevelTasks::FScheduler::WorkerLoop(LowLevelTasks::Private::FWaitEvent * WorkerEvent, LowLevelTasks::Private::TLocalQueueRegistry<1024,1024>::TLocalQueue * WorkerLocalQueue, unsigned int WaitCycles, bool bPermitBackgroundWork) Line 514	C++
 	[Inline Frame] UnrealEditor-Core.dll!LowLevelTasks::FScheduler::WorkerMain(LowLevelTasks::Private::FWaitEvent * WorkerEvent, LowLevelTasks::Private::TLocalQueueRegistry<1024,1024>::TLocalQueue * WorkerLocalQueue, unsigned int WaitCycles, bool bPermitBackgroundWork) Line 571	C++
 	UnrealEditor-Core.dll!LowLevelTasks::FScheduler::CreateWorker::__l2::<lambda>() Line 75	C++
 	[Inline Frame] UnrealEditor-Core.dll!UE::Core::Private::Function::TFunctionRefBase<UE::Core::Private::Function::TFunctionStorage<1>,void __cdecl(void)>::operator()() Line 470	C++
 	UnrealEditor-Core.dll!FThreadImpl::Run() Line 69	C++
 	UnrealEditor-Core.dll!FRunnableThreadWin::Run() Line 159	C++
 	UnrealEditor-Core.dll!FRunnableThreadWin::GuardedRun() Line 71	C++
 	[External Code]	

[You had previously mentioned a character’s physics [Content removed] Could you elaborate?

I’ve also tried to find reported issues that relate to Constraints.

This issue was the closest I could find that might relate to your problem.

If you have the time, take a look and let me know.

I hope to hear from you soon.

All the best,

[mention removed]​

[Attachment Removed]

Hello [mention removed]​

Thank you for the added information.

I’ve applied the check to the destructor on ~FPBDCollisionConstraint() and tried to hit it, but could not reach a setup where that happens.

I’ve also searched for other seemingly related reported issues in EPS and the issue tracker, but could not find any.

It might be the case that the issue is only present on your modified engine version or is related to a particular asset.

On my end, this could be closed as “Cannot Reproduce”, but I’ll escalate the issue to Epic so they can appreciate the proposed change, as I’m not knowledgeable enough about the system to evaluate.

All the best,

[mention removed]​

[Attachment Removed]

Ok, thank you [mention removed]​.

I think even though we can’t reproduce the bug at will (not knowing enough about what puts constraints into the required states), simply by code inspection of unmodified UE code we can see a failure case exists. If a constraint is destroyed without having been removed from ActiveConstraints, that’s a crash, even if intermittent (because sometimes you can get away with use-after-free). And there are states a constraint can be in that will prevent its removal from the array, even though destruction is imminent.

I suppose it is possible my proposed fix could have other unintended consequences I’m unaware of, though. I’ll let you know if we discover any such. :slight_smile:

[Attachment Removed]

Hi Geoff,

> Can I find out the use case you were running into this with please? I’m curious about how our clients are using these features?

I’m sorry to say I never did discover what gameplay (high level Game Thread) action resulted in disable-then-later-delete on the physics async task. We have a lot going on during gameplay, and a lot of gameplay programmers. :slight_smile: I’m on the Engine team, and this mystery crash in Chaos fell to me.

> Secondly have you found any side effects?

I’m very happy to say we have not discovered any side effects. The fix remains in place, and this crash is for us now merely an unpleasant memory. :smiley:

Thanks,

Steve

[Attachment Removed]