Mid Phase waiting unnecessary long dues to dependencies executed on background thread where long lasting tasks are running

Hello, we observe next situation on our Server build.

Inside SpatialAccelerationBroadPhase.h where broad phase related tasks are spawned (see e.g. PendingArray) they are scheduled to either foreground or background worker threads. At the same time AABB tree related tasks (e.g. AABBTreeProgressTimeSlice) also ran on worker threads. But they can take a lot of time (when time slice opertion is finished then tree is created e.g. this task is started: FChaosAccelerationStructureTask). The issue come from the fact that small tasks which are created by the Broad Phase code fall onto same workers as some long lasting task (which is really independent and could run multiple frames), but if small task pushed by the Broad Phase (from GT) will be pushed on the same worker with a long lasting task, then GT will have to wait for it, because small one will be executed after. As I understand there is no way to only push tasks on foreground threads now. Any Idea how I can make sure this situation never happens? I’ve attached a screenshot from utrace & .utrace files (look at frame5291 or better frame1599 file, as it has less of my changes, it should be clear). It maybe hard to see at first, but after long task there is a tiny task added by GT thread (broad phase) which it is waiting for.

[Attachment Removed]

Steps to Reproduce

[Attachment Removed]

Sorry instead of 1599, have a look at frame 4022 in the corresponding .utrace file.

[Attachment Removed]

Hi Sergii, what values do you have set in the CVars in the AABBTree.cpp file in Chaos?

Ie these:

[Image Removed]Best

Geoff Stacey

Developer Relations

EPIC Games

[Attachment Removed]

Thanks Sergii,

That AABB task looks to be created here: [Image Removed]within the PBDRigidsEvolution.cpp.

This is called from Base::ComputeIntermediateSpatialAcceleration() within the AdvanceOneTimeStepImpl function in the PBDRigidsEvolutionGBF.cpp file.

AssignMidPhases is called from the ProduceOverlaps function, which is after that call. However the ParticlePair call is before the AABB work in the traces, and after in the code. I don’t think there is a contention here, since the constraints should already be created from the AABBTree, which seems like it is just blocking on something ahead of it.

What I can see is that they are using a different task call for dispatching, the one above using TGraphTask and the others using the UE::Tasks::Launch.

I’ll reach out to the dev side to find out if there are known quirks in this process and revert!

Geoff

[Attachment Removed]

That all makes sense. The way these tasks are created could lead to out of order execution (since they are in different lists and depends when they are polled, and what priority etc they have) - but as you say the point which I need to do a bit clarifying on is why there are 2 threads doing nothing at this point. It could be some form of core affinity going on, but I’m honestly not sure at this point. Let me find out and get back to you!

Geoff

[Attachment Removed]

Can I check if you have thread limit set on this. I think the defaults for server builds can be lower than normal. I’d be keen to rule this out before we go too far down a rabbit hole!

[Attachment Removed]

Hi Sergii,

Can you do try something quickly for me please?

If you revert the alterations you have made, and in the 2 places there is a UE::Tasks::Launch function within DispatchTasks in SpatialAccelerationBroadphase.h, add a parameter to make the task priority high like the example here:

[Image Removed]

Does this make the problem go away?

If so I’ll add some detail on what is happening, but I’d like to test the theory first because I’m not 100% which problem is here.

Best

Geoff

[Attachment Removed]

No problem Sergii. If my suggestion does fix it though I think we can get it in as an actual fix.

[Attachment Removed]

Hi Sergii - can I follow up and check if you managed to try this fix please? If it works it’d be great to get this actually into the official product

Best

Geoff

[Attachment Removed]

Hello, Geoff

Same values, except for MaxProcessingTimePerSliceSeconds I set 0.003 but our main issue is not that physics takes a lot (although I’d be glad if it would take less :slight_smile: ) but that some small foreground tasks are scheduled for background workers (even though fg workers are free), and so it happens that there is some long lasting task which now stalls other ownes which will be executed after (and waited by GT) even though they could have beed easily scheduled on fg workers

[Attachment Removed]

I’ve done some changes according to this (https://forums.unrealengine.com/t/we\-would\-like\-to\-fully\-manage\-what\-threads\-certain\-tasks\-get\-assigned\-to\-what\-is\-the\-preferred\-method\-of\-doing\-this/2606447\) advice, but it was not enough, so I had to also modify StealItem function and now none of fg tasks are running on bg workers, but obviously, if there are many tasks, I’d like them to be running on bg as well, just not when there are emtpy fg ones which doing nothing.

I hope you get my idea. Have you opened 20251204_163034-frame1599.utrace file? Check frame 4022. Can you explain why Collisions::AssignMidPhases (10.3 µs) task is executed on bg worker even though it was scheduled before AccelerationStructureTimeSliceCopy (3.1 ms). But even if it would have been scheduled after there are FG0 and FG1 which are doing nothing and yet AssighMidPhases is not executed there. Do I miss anything?

[Attachment Removed]

I’ve rewritten ProduceOverlaps part (and other tasks in SpatialAccelerationBroadPhase.h) to GraphTask (FFunctionGraphTask::CreateAndDispatchWhenReady) just in case but it did not change the picture.

The thing is that AdvanceOneTimeStepImpl is completely independent of all other tasks. and in my modified task scheduling code (where I prohibit to grab tasks wit foreground priority for bg threads) it works just fine.

Basically what I did is in LocalQueue.h I’ve also added MinPriority like this:

	const int32 MaxPriority = GetBackGroundTasks ? int32(ETaskPriority::Count) : int32(ETaskPriority::ForegroundCount);
	const int32 MinPriority = (GetBackGroundTasks && G01TaskGraphNoFGTasksOnBGWorkers) ? int32(ETaskPriority::ForegroundCount) : 0;
 
	for (int32 PriorityIndex = MinPriority; PriorityIndex < MaxPriority; ++PriorityIndex) {
		...
	}

[Attachment Removed]

but again, I think that grabbing fg tasks by bg workers is just fine, but in case when other threads are busy, but if we have 2 other workers free, this looks strange to me, and this is what I am trying to understand.

[Attachment Removed]

I do not have any custom changes, we just use default setup. But (for the check) I’ve also increased amount of worker threads from 4 (default) to 6, so that amount of bg workers is +2 and I still see same behaviour

edit: in fact if you check that .utrace file you will se that there are 2 fg + 3 bg workers available

[Attachment Removed]

I will try. I think I already did it but will try again a bit later. BNy the way I’ve found one workaround which works and seems to be logical. You can reasd my reply here: We would like to fully manage what threads certain tasks get assigned to, what is the preferred method of doing this? - #5 by dcou

[Attachment Removed]