Engine crashes on Radeon 90xx when HW RT is On

Hi

We are on UE 5.5 (latest from p4) + fix for group barier divergent branch (CL# 42243010) that we needed for Intel.

When in our game we set `r.Lumen.HardwareRayTracing 1` the game crashes after around 30 seconds (for that time rendered graphics is OK, so RT works, etc.). Without HWRT all is fine, game is stable for hours of active gameplay.

The crash is somehow content dependent, on test scenes where only couple of geometry objects are present, all is fine, but for scenes with dense geometry, it crashes realy fast.

It crashes ONLY on Radeon 90xx series (for example Radeon RX 9070 XT), it works fine on RX 6900XT, and 7900XT.

Drivers 25.5.1 (latest)

from the unreal side crash is in:

`LogD3D12RHI: Error: GPU crash detected:

  • Device 0 Removed: DXGI_ERROR_DEVICE_HUNG

LogRHI: Error: Active GPU breadcrumbs:

Device 0, Pipeline Graphics: (In: 0x81697626, Out: 0x81697625)
(ID: 0x81697476) [ Active] Frame 102304
(ID: 0x81697689) [ Active] FRDGBuilder::Execute
(ID: 0x81697485) [ Active] Scene
(ID: 0x81697620) [ Active] RenderDeferredLighting
(ID: 0x81697621) [ Active] DiffuseIndirectAndAO
(ID: 0x81697625) [ Active] TranslucencyVolumeLighting
(ID: 0x81697626) [ Active] LumenReflections
(ID: 0x81697627) [Not Started] InitTranslucencyLightingVolumeTextures
(ID: 0x81697628) [Not Started] Lights
(ID: 0x81697629) [Not Started] DirectLighting
(ID: 0x8169762a) [Not Started] InjectTranslucencyLightingVolume
(ID: 0x8169762b) [Not Started] InjectTranslucencyLightingVolume(View=0)
(ID: 0x8169762c) [Not Started] VirtualShadowMapProjectionMaskBits
(ID: 0x8169762d) [Not Started] BatchedLights
(ID: 0x8169762e) [Not Started] UnbatchedLights
(ID: 0x8169762f) [Not Started] DYE5P91Z5U0FC07540W11FDI3.Spot_Anomaly_illumination`from the Radeon crash dump tool crash is in:

`===================
MARKERS IN PROGRESS

Command Buffer ID: 0x274374

DispatchIndirect
Barrier [2 repeating occurrences]

=====================
EXECUTION MARKER TREE

Legend

finished
[>] in progress
[#] shader in flight
not started

Command Buffer ID: 0x274374 (Queue type: Direct)


DispatchIndirect [Driver-PAL]
DispatchIndirect [Driver-PAL]
DispatchIndirect [Driver-PAL]
DispatchIndirect [Driver-PAL]
----------Barrier---------- [Driver-PAL]
----------Barrier---------- [Driver-PAL]
Dispatch(ThreadGroupCount=[1,1,1]) [Driver-PAL]
----------Barrier---------- [Driver-PAL]
----------Barrier---------- [Driver-PAL]
DispatchIndirect [Driver-PAL]
----------Barrier---------- [Driver-PAL]
Dispatch(ThreadGroupCount=[1,1,1]) [Driver-PAL]
----------Barrier---------- [Driver-PAL]
----------Barrier---------- [Driver-PAL]
DispatchIndirect [Driver-PAL]
----------Barrier---------- [Driver-PAL]
----------Barrier---------- [Driver-PAL]
Dispatch(ThreadGroupCount=[1,1,1]) [Driver-PAL]
----------Barrier---------- [Driver-PAL]
----------Barrier---------- [Driver-PAL]
----------Barrier---------- [Driver-PAL]
Dispatch [Driver-PAL]
[>] DispatchIndirect [Driver-PAL]
[>] ----------Barrier---------- [Driver-PAL]
[>] ----------Barrier---------- [Driver-PAL]
DispatchIndirect [Driver-PAL]
----------Barrier---------- [Driver-PAL]
DispatchIndirect [Driver-PAL]
----------Barrier---------- [Driver-PAL]
----------Barrier---------- [Driver-PAL]
DispatchIndirect [Driver-PAL]
----------Barrier---------- [Driver-PAL]
----------Barrier---------- [Driver-PAL]
DispatchIndirect [Driver-PAL]
----------Barrier---------- [Driver-PAL]
----------Barrier---------- [Driver-PAL]
DrawIndexed(IndexCount=3, InstanceCount=1) [Driver-PAL]
----------Barrier---------- [Driver-PAL]
Dispatch(ThreadGroupCount=[16,16,16]) [Driver-PAL]
----------Barrier---------- [Driver-PAL]
Draw(VertexCount=4, InstanceCount=64) [Driver-PAL]
…`I know this is new card, but meybe there is some fix that we can apply to the engine to fix this issue ?

Steps to Reproduce

Unfortunately as far as I know AMD DXR implementation is not spec compliant which makes it tricky to pinpoint what the issue is. This one seems to be driver specific too. Can you run with r.D3D12.RayTracing.GPUValidation 1 to see if it catches something.

That’s been fixed for quite some time already (see github). You can just remove the breadcrumbs locally from validation code.

Do you have a fairly consistent repro of this crash? We can potentially start narrowing it down by disabling different parts of HWRT (like different options under r.RayTracing.Geometry.*).

So what I would suggest is trying to eliminate potential issues one by one. It might help to zero in on the problematic mesh/material.

Start with disabling anything related to Niagara:

r.RayTracing.Geometry.NiagaraRibbons=0 r.RayTracing.Geometry.NiagaraSprites=0 r.RayTracing.Geometry.NiagaraMeshes=0Then you could also try disabling:

r.RayTracing.Geometry.SplineMeshes=0 r.RayTracing.Geometry.Text=0 r.RayTracing.Geometry.Cable=0Then anything that runs through dynamic update:

r.RayTracing.Geometry.StaticMeshes.WPO=0 r.RayTracing.Geometry.SkeletalMeshes=0If none of those work it would probably suggest that it’s a problem with one of the static meshes (and it will be more difficult to pinpoint which one). You can confirm that with r.RayTracing.Geometry.StaticMeshes and/or r.RayTracing.Geometry.NaniteProxies depending on if you use Nanite or not. Ideally set them both to 0.

Hi

it 100% breaks here:

[Image Removed]just in the first second after level start (our main menu is just empty scene with UI)

on the very first call to RHIBuildAccelerationStructures

[Image Removed]

OK, I’v integrated that change, now validation is not crashing (and is not triggered),

crash is in that same place as above, but if i’m adding `-D3DDebug` to the commandline, crash is waay more frequent (1-2 seconds after level start, instead of 30), and it’s in different place

[Image Removed]

anything else I could try ?

yes, I have 100% repro that can be done within 30 seconds, so fairy quick one, on specific level.