UE5.3 DX11 GameThread timed out waiting for RenderThread after 120.00

This question was created in reference to: [GameThread timed out waiting for RenderThread after 120.00 secs (texture streaming [Content removed]

Hi there, we have a relatively small amount of users encountering a few variants of more mysterious cases of the "GameThread timed out waiting for RenderThread after 120.00 secs" issue*.*

Case 1: We are waiting on RenderThread, but Renderthread has no work and is idle.

Case 2: We are waiting on RenderThread, but there is no RenderThread to be seen in our minidump.

Case 3: RHI thread is in the middle of present, while game thread is waiting on render thread, and render thread waiting on rhi thread.

Case 4: Render thread is waiting on RHI Thread, but there is no RHI thread to be seen in our minidump.

In our project, PCs with lower core-counts will have the RHI-thread disabled. If the RHI thread is disabled, we disallow parallel rendering passes, due to encountering much more obvious and reproducible deadlocks in these situations.

These are unlike our more “normal” cases of GT timeout, because there are no logs to indicate a problem has occurred ( DXGI_ERROR_DEVICE_RESET , DXGI_ERROR_DEVICE_REMOVED, E_OUTOFMEMORY, etc)

I came across the UDN post linked above, and was wondering what this scheduler bug might have been, and if it could be related.

“The Foundation team are looking into a scheduler bug which may cause some tasks to never be started in some rare circumstances, and that can also trigger this timeout”.

It seems to be more common than expected on older Intel integrated graphics (HD4600), but the most common GPUs are 3060 and 4060 due to their popularity.

Will follow up with some call-stacks of various threads, if nothing comes to mind around this.

Many thanks!

[Attachment Removed]

Hi,

It’s very difficult to tell what might be the problem without seeing thread callstacks. I vaguely recall that at some point we suspected we might have a bug in taskgraph where pending tasks were not picked up so named threads could wait forever even though there was work for them, but I don’t think it was ever confirmed. CL 42639302 (github commit 0517580) fixes a bug which seems very close to this, but it went into 5.6, and I think we stopped seeing the render thread hang earlier than that, so it might have been fixed by some other refactor. Still, it’s worth cherrypicking this change.

For case 2, older engine versions used to restart the render thread under some scenarios, but I don’t think any of them happened in the client. Do the logs indicate what the client is doing when this happens? If it’s not during shutdown, maybe it’s just a case of broken minidumps.

Case 3 is usually a deadlock in the driver. We’ve seen this quite a bit, with all GPU vendors and across many driver versions, but more often on old integrated GPUs. What usually happens is that the driver is compiling or optimizing shaders on background threads, and it somehow manages to deadlock these tasks. Present is one of the sync points where they usually try to drain some of this work, and then it waits forever for the deadlocked tasks. In a few other cases it’s just a hardware failure that for some reason doesn’t result in a TDR. Either way, there’s nothing you can do about this.

Case 4 sounds similar to case 2, either shutdown or broken minidumps.

If you can share some logs and thread callstacks we can dig into it more.

[Attachment Removed]