We have been investigating an on-going GPU Crash report that has been affecting a significant portion of players in our game. We’re running a 5.5 custom build, but as far as the renderer specifically is concerned only changes to the shading model paths. What has made this bug so tricky for us is that it is seemingly happening whenever and with no consistent repro.
The crash itself shows either as DXGI_ERROR_DEVICE_REMOVED or DXGI_ERROR_DEVICE_HUNG and doesn’t show us much information in the way of Breadcrumbs only ever showing us a frame number, Aftermath sometimes crashes out and when we are able to get a file it’s without symbols. And DRED hasn’t given us much information either showing DRED: No PageFault data..
We were able to get this information as a result of running with D3DDebug, and as far as my limited understanding goes seems that the data is incorrectly set up for the GPU Driver?
[D3DDebug] ID3D12CommandQueue::ExecuteCommandLists: Using Draw on Command List (0x00000176F9CD5520:'FD3D12CommandList (GPU 0)'): Resource state (0xC0: D3D12_RESOURCE_STATE_NON_PIXEL_SHADER_RESOURCE|D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE) of resource (0x0000017616FC92A0:'FParticleStatePosition') (subresource: 0) is invalid for use as a render target. Expected State Bits (all): 0x4: D3D12_RESOURCE_STATE_RENDER_TARGET, Actual State: 0xC0: D3D12_RESOURCE_STATE_NON_PIXEL_SHADER_RESOURCE|D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE, Missing State: 0x4: D3D12_RESOURCE_STATE_RENDER_TARGET.This has implied to us as a Niagara issue - with Aftermath showing an MMU Fault where a shader instruction is trying to access memory on compute_01 @ 0x00001960. Below is attached the portion relevant of the .log file and an additional nv-gpudmp from a separate unrelated crash.
Investigating a little further has suggested to us that this maybe specifically related to GPU Particles. We’re just unsure if this is related to a specific setting in Niagara, or something at fault with one of our assets.
One other useful thing is that sometimes we do get around roughly which pass it seems to crash at, and it’s a lot of the time
(ID: 0x83ca9b32) [ Active] ContrastAdaptiveShading.
(ID: 0x83ca9b32) [ Active] ContrastAdaptiveShading. We have had this for quite a while now, and have confirmed it’s across _all driver versions_ and has affected everything from a RTX 1650, AMD equivalents and has even crashed out RTX 5090s - Intel GPUs are also not exempt.
Unfortunately, we were unable to get Shader Symbols working and have tried a number of commands, but we dont end up with any symbols.
r.DumpShaderDebugInfo=1
r.Shaders.Symbols=1
r.Shaders.GenerateSymbols=1
r.Shaders.WriteSymbols=1
So we had some questions - Firstly we’re wondering if anyone has spotted an issue similar to this, and if there is potentially already a path that would give us some insight to our crash, and if so - is there something we could cherrypick as a fix?
We’re also wondering how we can get Shader symbols to export correctly in future GPU Crashes, and if there is a performance impact to allowing these on should we want to leave it on to collect more data from crashed players about the specifics.
Any information on how to correctly debug GPU issues going further would also be greatly appreciated - Thanks for reading, I’m hoping you can help us figure it out and learn more about debugging our GPU issues.