Random Ray Tracing Crashes in D3D12RHI

Hi! We’re seeing random crashes on some hardware when using ray tracing, with the following stack traces:

Assertion failed: RecordData.State == FRecordData::EState::Persistent [File:C:\***\Engine\Source\Runtime\D3D12RHI\Private\D3D12RayTracing.cpp] [Line: 2190]
...
UnrealEditor_D3D12RHI!FD3D12RayTracingShaderBindingTableInternal::SetHitGroupGeometrySystemParameters()
...

and sometimes:

Assertion failed: RecordData.State != FRecordData::EState::Persistent || BindingType == ERayTracingLocalShaderBindingType::Transient [File:C:\***\Engine\Source\Runtime\D3D12RHI\Private\D3D12RayTracing.cpp] [Line: 2166]
...
UnrealEditor_D3D12RHI!FD3D12RayTracingShaderBindingTableInternal::SetHitGroupGeometrySystemParameters()
...

We couldn’t localize the issue or create a consistent repro case.

Do you have any suggestions on how to debug or fix this?

Hi,

I do see that we had a similar issue reported in 5.7 with a fix that may address the issue you’re seeing or help provide some clues.

[CL#45286001 [HWRT] Fix crash due to duplicate shader bindings.](https://github.com/EpicGames/UnrealEngine/commit/875e926d6d828287ec16a862bfb6dbed74b9cc5c)

In this particular case, it was due to duplicate shader bindings.

I was unable to find a similar fix for 5.6.1 but the code mentioned above which includes FinishGatherVisibleShaderBindings was added in CL#45039770 and the method used to discover the crash was by bisecting the changes because the issue occurred fairly quickly after the breaking change was made. I’ve reached out to the team for further suggestions.

There’s another changelist in UE 5.7 that may fix this issue

[CL#45427462 [HWRT]](https://github.com/EpicGames/UnrealEngine/commit/5cf4288d23ab4f5c5152a40d2999af0cf6ba98ec)

* Remove call to FlushAllocationsToClear on SBT before setting the current bindings on the SBT because the clear call could flush pending clear ops which are still used this frame. Only flush clear after add & remove of primitive scene infos on RT timeline. Pending clears can be added by async stream in requests in between the building of the dirty bindings and the flush to RHI from RDG passes causing cleared data in the SBT and triggering validation errors and potentially missing entries in the SBT.

* Also keep track of dirty records and clear dirty record when reallocated again before the clear was flushed (otherwise valid entries can be cleared)

However, backporting this changelist may be difficult because of the numerous changes to the system between 5.6 and 5.7.

We’re using version 5.6.1 (based on CL 43139311), and I couldn’t find any similar lines of code in this or other files.

Is there perhaps an equivalent or related fix available for this version? Thanks.