GPU crash DXGI_ERROR_INVALID_CALL (Command lists must be successfully closed before execution)

Hello Epic Support Team,

When we running automated game testing with -d3ddebug sometimes we get GPU crash with log below in UE5.5.4.

ID3D12CommandQueue::ExecuteCommandLists: Command lists must be successfully closed before execution.

Error: [D3DDebug] ID3D12Device::RemoveDevice: Device removal has been triggered for the following reason (DXGI_ERROR_INVALID_CALL: There is strong evidence that the application has performed an illegal or undefined operation, and such a condition could not be returned to the application cleanly through a return code).

We have tried to cherry-pick change below, but issue is still reproducible.

https://github.com/EpicGames/UnrealEngine/commit/16487ca412aea570c424814e4b0e9e45ede456d7

Full log is shared in private tread here [Content removed]

GZW_2 (76)

We would like to know if you already have some CLs we can try to cherry-pick or if there anything we can do narrow it down find the source of the issue.

Thank you

[Attachment Removed]

Steps to Reproduce
GPU crash happens randomly with no exact repro.

[Attachment Removed]

Thanks for reporting this. I didn’t find any additional CLs or fixes. The fix mentioned above came from [Content removed] so that might provide more information about how the missing close case was found for that scenario. I’m passing this request along to my colleague who looked into it, in case he has further suggestions.

[Attachment Removed]

Hi Oleksii,

I’ll check with the RHI team if they had similar cases in 5.5.4, in the meantime I would like to ask you if ever repro this crash in regular gameplay or just on the automated game test?

I’m wondering if since you are using -gpuvalidation as well., there might be an exceptional long frame that is possibly messing up the submission pipeline, would be possible to run the automated test without that flag to see if the error goes away?

Thanks,

Daniele

[Attachment Removed]

Hi [mention removed]​

We were running automated gameplay test with -gpuvalidation, the number of such crashes is quite low, but it usually reproduces every run. We will recheck if issue can be reproduced without gpuvalidation.

We don’t have manual game testing with d3ddebug enabled, but our automation is close to regular gameplay in terms of GPU load.

Thank you

[Attachment Removed]

Without -gpuvalidation error Command lists must be successfully closed before execution doesn’t reproduce, but few time we had GPU crash with error below.

LogD3D12RHI: Error: [D3DDebug] ID3D12CommandAllocator::Reset: The command allocator cannot be reset because a command list is currently being recorded with the allocator.

LogD3D12RHI: Error: [D3DDebug] ID3D12GraphicsCommandList::*: This API cannot be called on a closed command list.

LogD3D12RHI: Error: [D3DDebug] ID3D12GraphicsCommandList::*: This API cannot be called on a closed command list.

LogD3D12RHI: Error: [D3DDebug] ID3D12Device::RemoveDevice: Device removal has been triggered for the following reason (DXGI_ERROR_DEVICE_HUNG: The Device took an unreasonable amount of time to execute its commands, or the hardware crashed/hung. As a result, the TDR (Timeout Detection and Recovery) mechanism has been triggered. The current Device Context was executing commands when the hang occurred. The application may want to respawn and fallback to less aggressive use of the display hardware).

Full log shared in private ticket here:

Look for Log files for ID3D12CommandAllocator::Reset.

[Content removed]

[Attachment Removed]

Hi Oleksii, sorry the delayed reply but I didn’t find anything similar in other cases and out internal test that can point out why the Reset is called on an already closed command list.

I noticed another validation error before the reset issue:

Error: [D3DDebug] ID3D12CommandList::CopyResource: Using CopyResource on Command List (0x00000AA51390C140:‘Unnamed ID3D12GraphicsCommandList Object’): Resource state (0x8: D3D12_RESOURCE_STATE_UNORDERED_ACCESS) of resource (0x0000025EA1C4E3D0:‘InterpolatedRT’)

Do you have engine modification on top of the 5.5 or do you know where that resource is being copied?

Would be possible to have access to a map where the issue can be reproduced?

Thanks,

daniele

[Attachment Removed]

Hi [mention removed]​ !

InterpolatedRT errors are coming from FSR4 Frame Generation which has such validation issues on some Nvidia GPUs reported. We haven’t yet reported this issue to AMD. Usually this logs from FSR4 Frame Generation doesn’t correlate with GPU crashes, but I understand that we should verify this with Frame Generation disabled too.

Best regards

[Attachment Removed]

Hi Oleksii,

we still cannot repro the issue in our tests, my suspicion is that is coming from some of the plugins enabled that we don’t cover by default, if you could provide a small repro level with the right we can help debugging where the invalid call is coming from.

Thanks,

Daniele

[Attachment Removed]

If I find a good repro, I will provide more info on this. At the moment, this D3DDebug error is rare, and I don’t understand how to reproduce it.

In the meantime, I will look closer at our plugins.

Thank you

[Attachment Removed]