This question was created in reference to: [PCG using all background workers and causing a [Content removed]
Also related to [Content removed] .
Hi,
We’re seeing similar deadlocks as the ones reported in the previous thread when using runtime PCG generation.
As suggested, we started to tweak pcg.MaxPercentageOfThreadsToUse and pcg.MaxPercentageOfExecutingThreads.
Something that I noticed is that when tweaking those values in a conservative way I was still able to have deadlocks. When looking at the parallel callstack I was seeing PCG using all the of the Background Workers available for my particular hardware.
FPCGGraphExecutor::ExecuteScheduling(…) does some math with those CVars, and eventually calculate the required workers as follows:
// Number of threads to use for PCG Element execution (outside of main thread tick)
const int32 MaxExecutingThreads = PCGGraphExecutor::CVarGraphMultithreading.GetValueOnAnyThread() ? FMath::Max(0, (int32)(MaxNumThreads * MaxPercentageOfExecutingThreads)) : 0;
// Number of threads to use for PCG Async operations
const int32 MaxPCGAsyncThreads = FMath::Min(MaxNumThreads - MaxExecutingThreads, CVarMaxNumTasks.GetValueOnAnyThread());
But even with those restriction ( and tweaks in the mentioned CVars ) I still see PCG in general using all the possible Background Workers.
Could it be possible that some PCG tasks are sent to the Background Worker pool without honoring the heuristic from FPCGGraphExecutor ?
Did you have a repro step on how/where this is happening?
To give you some context, one of the original cases where we ended up with a deadlock on the bulk data (on the PCG landscape cache) had to do with the way the loading the data was blocking and expecting something else to complete before going on, which led to the aforementioned deadlock.
I’ve dispatched to my colleague for a deeper look into it in the meantime.
The current PCG Graph Executor does in fact not respect the total number of tasks that can be started by the PCGAsync api, it does instead give a number based on number of available threads. (but it shouldn’t really matter and shouldn’t lead to stalls/deadlocks)
We’ve tested cases with very low amounts of worker threads (even 0) and currently don’t have a known use case where we can repro what you are seeing.
What I would be intersted to know is what do you describe as an actual deadlock? What task is not advancing and why? Does it have an external dependency and keep ownership of the worker thread? Like Julien mentioned we had issues where our tasks were keeping hold of the worker thread starving the system but those have been fixed since.
Another thing we’ve fixed after 5.7 release is that we’ve replaced all native mutexes by non native ones which fixes some issues with runtime gen where we were creating to many native primitives and busting the platform’s budget.
I would probably need more details on the stall to give more insight.
Hey Alvaro, this OpenReadBulkData bug has been fixed to my knowledge a while ago with CL 40273823 in UE5/Main.
Also we’ve changed the behavior of the LandscapeCache so that it can only be prepared on the main thread as this was causing other issues accessing its textures. I see the case is opened for 5.6 but both those issues were fixed in 5.6.
We don’t have an isolated way to reproduce this issue.
We’re currently being very aggressive with runtime PCG, which is maybe why it’s not a very common deadlock.
Something that we noticed was that we can have a more consistent repro rate if we set PCG.FrameTime to 1.
On our end, as a temporary fix, we ended up adding a CriticalSection owned by FPCGGraphExecutor and propagated through the FPCGAsyncState of the task we’ll execute in the workers. So, when FPCGAsync::AsyncProcessing(…) tries to execute background tasks it won’t overlap requests from other threads executing FPCGGraphExecutor::ExecuteScheduling(…).
We noticed some times FPCGGraphExecutor::ExecuteScheduling(…) was queuing task ( ie, calling FPCGAsync::AsyncProcessing(…) ) from different threads without honoring the quota imposed. Adding the mutex seems to alleviate the problem.
What I would be intersted to know is what do you describe as an actual deadlock? What task is not advancing and why? Does it have an external dependency and keep ownership of the worker thread? Like Julien mentioned we had issues where our tasks were keeping hold of the worker thread starving the system but those have been fixed since.
"
We’re experiencing basically the same as described in this thread: [Content removed]
ie, FPCGLandscapeCacheEntry::TouchAndLoad(…) stuck in a critical section and while other thread ( the one that acquired the critical section ) is waiting for OpenReadBulkData() to get the data, but this needs background workers which are not available on our case ( since PCG is using all of them and most of them are locked in a mutex ).
I could share a parallel stack in case it helps.
"
Another thing we’ve fixed after 5.7 release is that we’ve replaced all native mutexes by non native ones which fixes some issues with runtime gen where we were creating to many native primitives and busting the platform’s budget.
"
We adopted some of those changes since we were having cashes in PS5 because we exhausted the system resources for mutexes.
I thought that won’t be a problem for Windows mutexes, but let me know if it’s not the case.
This thread was closed, but I’m not sure if you might still need the the parallel callstack in case you want to take a look on your side ( see me previous reply ).