Deadlock when using runtime PCG

This question was created in reference to: [PCG deadlock due lack of brackground [Content removed]

As mentioned in the previous thread, we’re seeing a deadlock in Runtime PCG, since it runs out of background workers while one of the threads is waiting for pcg data ( likely blocked due I/O ).

We upgraded to 5.7.3 ( from 5.6 ) in hopes to have this issue fixed, but we’re still getting the deadlock pretty consistently.

Do you know if other developers have faced a similar problem in 5.7 ?

I can share a parallel callstack screenshot if that could help to diagnose the problem.

Let me know.

Thanks!

[Attachment Removed]

Hi Alvaro,

Please do! It’ll help in double-checking if we know or not of this problem, and if we have a solution!

Cheers,

Julien

[Attachment Removed]

Hey Alvaro, this might be an actual bug where the graph is using the Landscape without properly priming its cache on the main thread (see ULandscapeData::PrepareForSpatialQuery) can you show me the PCG Graph that does the GetLandscapeData and uses it?

There was an actual bug with the IO (fixed in 5.6) but it seems you are still ending up starving worker threads which isn’t something we’ve seen since.

Cheers,

Patrick

[Attachment Removed]

Also make sure pcg.SpatialData.EnablePrepareForSpatialQuery is ‘true’ (it should be the default)

Oh! I think I might have an idea. It seems we are allowing the PrepareForSpatialQuery to run outside of game thread outside of the editor. Do you have the ability to compile the code? If so could you set Params.bCanExecuteOnlyOnMainThread to true always in UPCGLandscapeData::PrepareForSpatialQuery and let us know how this goes?

[Attachment Removed]

Sorry for the multiple posts,

I see that you are using a Projection element and this might not be preparing the data properly and hitting that case where we are loading data outside of the Game thread which is usually fine but seems to be causing issues on your setup. Will need a bit of time to follow up here with a workaround if we can suggest one until a proper fix is made.

Cheers,

Patrick

[Attachment Removed]

Some thing to try which might help resolve this. Depending on if you are compiling your own version of the engine or not, changing the Async chunk size in UPCGProjectionData::CreateBasePointData might help reducing the number of running threads and preventing this starving.

If you can compile the code then you can increase the value passed to FPCGAsync::AsyncProcessingRangeEx (currently none is given meaning it uses 64 as the chunk size) try and increase this to something bigger than 64 (128?256?..) see if you end up in a situation that unblocks you.

If you can’t compile the code you can still impact that chunk size but the change is a CVar which will impact other code paths. If the change of chunk size is small enough it might be acceptable.

You can do that by changing the pcg.AsyncOverrideChunkSize to a non negative value greater than 64 which is the current default.

Let me know if this helps,

Patrick

[Attachment Removed]

Hey Alvaro,

This is good news :slight_smile:

We are going to try and get something in for 5.8.

If I may suggest you might want to keep that change as a CVar yourself so that you can tweak that number without having to compile new binaries everytime.

Cheers,

Patrick

[Attachment Removed]

Hi [mention removed]​ ,

I’m attaching the parallel stack.

[Image Removed]

On the left side of the image you can see all the PCG related threads.

It seems the deadlock is caused by UE::BulkData::Private::OpenReadBulkData(…), maybe a threaded IO operation is not finished.

Thanks in advance,

[Attachment Removed]

Hi [mention removed]​ ,

It seems if we set the Chunk Size to 256 we can no longer reproduce the deadlock ( at least in that particular location ).

I’ll have to ask our QA team to do an extensive check before moving forward.

Do you think this deadlock might be addressed in the upcoming 5.8 release ?

I would like to add a comment in our engine source code change so we can revert that change once we move to a newer version of the engine.

Thanks a lot for the help,

Alvaro

[Attachment Removed]