Crash in FD3D12DescriptorCache::SetDescriptorHeaps

We are experiencing crashes when starting an editor after loading a level with a given stack:

ID3D12DescriptorHeap* PendingViewHeap =

#if PLATFORM_SUPPORTS_BINDLESS_RENDERING

bSetBindlessHeaps ? BindlessResourcesHeap->GetHeap() :

#endif

CurrentViewHeap->GetHeap();

PLATFORM_SUPPORTS_BINDLESS_RENDERING is defined

bSetBindlessHeaps is false and CurrentViewHeap is null which triggers an access violation error

We observed that this crash is directly related to ray tracing shadows (disabling them with r.raytracing.shadows = 0 will make the crash dissapear) but unfortunatelly we don’t have a stable repro. It happens occasionally and when it happens then ussually next couple of following executions also triggers an error. It happens on NVIDIA and AMD GPUs.

[Attachment Removed]

Not sure if the provided stack was uploaded so here it is again to make sure:

FD3D12DescriptorCache::SetDescriptorHeaps(ED3D12SetDescriptorHeapsFlags) D3D12DescriptorCache.cpp:90
[Inlined] FD3D12DescriptorCache::UnsetExplicitDescriptorCache() D3D12DescriptorCache.cpp:885
FD3D12CommandContext::UnsetExplicitDescriptorCache() D3D12Commands.cpp:86
DispatchRays(FD3D12CommandContext &, const FRayTracingShaderBindings &, const FD3D12RayTracingPipelineState *, unsigned int, FD3D12RayTracingShaderBindingTableInternal *, const D3D12_DISPATCH_RAYS_DESC &, ED3D12QueueType, FD3D12Buffer *, unsigned int) D3D12RayTracing.cpp:5764
FD3D12CommandContext::RHIRayTraceDispatch(FRHIRayTracingPipelineState *, FRHIRayTracingShader *, FRHIShaderBindingTable *, const FRayTracingShaderBindings &, unsigned int, unsigned int) D3D12RayTracing.cpp:5793
FRHICommandRayTraceDispatch::Execute(FRHICommandListBase &) RHICommandListCommandExecutes.inl:550
FRHICommand::ExecuteAndDestruct(FRHICommandListBase &) RHICommandList.h:1620
FRHICommandListBase::Execute() RHICommandList.cpp:542
FRHICommandListExecutor::FTranslateState::Translate(FRHICommandListBase *) RHICommandList.cpp:1092
`FRHICommandListExecutor::FSubmitState::Dispatch'::`10'::<lambda_1>::operator()() RHICommandList.cpp:1043
[Inlined] UE::Core::Private::Function::TFunctionRefBase::operator()() Function.h:414
FRHICommandListExecutor::FTaskPipe::Execute(FRHICommandListExecutor::FTaskPipe::FTask *, const TRefCountPtr<…> &) RHICommandList.cpp:737
[Inlined] UE::Core::Private::Function::TFunctionRefBase::operator()(Type, const TRefCountPtr<…> &) Function.h:414
[Inlined] TFunctionGraphTaskImpl::DoTaskImpl(TUniqueFunction<…> &, Type, const TRefCountPtr<…> &) TaskGraphInterfaces.h:1129
[Inlined] TFunctionGraphTaskImpl::DoTask(Type, const TRefCountPtr<…> &) TaskGraphInterfaces.h:1110
TGraphTask::ExecuteTask() TaskGraphInterfaces.h:712
UE::Tasks::Private::FTaskBase::TryExecuteTask() TaskPrivate.h:518
[Inlined] UE::Tasks::Private::FTaskBase::Init::__l2::<lambda_1>::operator()() TaskPrivate.h:180
[Inlined] LowLevelTasks::FTask::Init::__l13::<lambda_1>::operator()(const bool) Task.h:499
[Inlined] Invoke(LowLevelTasks::FTask::<lambda_1> &, bool &) Invoke.h:47
[Inlined] LowLevelTasks::TTaskDelegate<LowLevelTasks::FTask * __cdecl(bool),48>::TTaskDelegateImpl<`LowLevelTasks::FTask::Init<`UE::Tasks::Private::FTaskBase::Init'::`2'::<lambda_1> >'::`13'::<lambda_1>,0>::Call(void *,bool) TaskDelegate.h:162
LowLevelTasks::TTaskDelegate<LowLevelTasks::FTask * __cdecl(bool),48>::TTaskDelegateImpl<`LowLevelTasks::FTask::Init<`UE::Tasks::Private::FTaskBase::Init'::`2'::<lambda_1> >'::`13'::<lambda_1>,0>::CallAndMove(LowLevelTasks::TTaskDelegate<LowLevelTasks::FTask * __cdecl(bool),48> &,void *,unsigned int,bool) TaskDelegate.h:171
[Inlined] LowLevelTasks::TTaskDelegate::CallAndMove(LowLevelTasks::TTaskDelegate<…> &, bool) TaskDelegate.h:309
LowLevelTasks::FTask::ExecuteTask() Task.h:627
LowLevelTasks::FScheduler::ExecuteTask(LowLevelTasks::FTask *) Scheduler.cpp:397
[Inlined] LowLevelTasks::FScheduler::TryExecuteTaskFrom(LowLevelTasks::Private::FWaitEvent *, LowLevelTasks::Private::TLocalQueueRegistry<…>::TLocalQueue *, LowLevelTasks::Private::FOutOfWork &, bool) Scheduler.cpp:698
LowLevelTasks::FScheduler::WorkerLoop(LowLevelTasks::Private::FWaitEvent *, LowLevelTasks::Private::TLocalQueueRegistry<…>::TLocalQueue *, unsigned int, bool) Scheduler.cpp:757
[Inlined] LowLevelTasks::FScheduler::WorkerMain(LowLevelTasks::Private::FWaitEvent *, LowLevelTasks::Private::TLocalQueueRegistry<…>::TLocalQueue *, unsigned int, bool) Scheduler.cpp:816
`LowLevelTasks::FScheduler::CreateWorker'::`2'::<lambda_1>::operator()() Scheduler.cpp:220
[Inlined] UE::Core::Private::Function::TFunctionRefBase::operator()() Function.h:414
FThreadImpl::Run() Thread.cpp:66
FRunnableThreadWin::Run() WindowsRunnableThread.cpp:156
FRunnableThreadWin::GuardedRun() WindowsRunnableThread.cpp:71

[Attachment Removed]

I’m getting this with 5.7.1. We’re using full bindless, and it is a 100% repro when enabling DLSS.

The issue is FD3D12DynamicRHI::RHIFinishExternalComputeWork() is being called by the DLSS plugin, and that function then calls:

Context.StateCache.GetDescriptorCache()->SetDescriptorHeaps(ED3D12SetDescriptorHeapsFlags::ForceChanged);SetDescriptorHeaps() then crashes since the logic in there no longer thinks bindless is in use since there was no bindless flag passed in along with “ForceChanged”. Looking at the history, I think the last update has a logic flaw. I’m not sure what all the intent was, but I changed the following line:

#if PLATFORM_SUPPORTS_BINDLESS_RENDERING
	bool bSetBindlessHeaps = IsUsingBindlessHeap();//CouldUseBindless() && EnumHasAnyFlags(SetFlags, ED3D12SetDescriptorHeapsFlags::Bindless);
 
#if DO_CHECK

and DLSS is now working fine.

[Attachment Removed]

Hello,

I believe we addressed this in UE5 Main in January: https://github.com/EpicGames/UnrealEngine/commit/c1afa9528b2223e809a0f2ed7459754676e2e2a9

There was an issue where plugins like DLSS could leave D3D12RHI in a bad state, resulting in this crash.

There is also another important Bindless fix in 5.7.4 that addresses similar stability issues: https://github.com/EpicGames/UnrealEngine/commit/1d614a7e99eeba3065d9bc38c58f4b3c022b3132

Please let us know if that addresses the issue.

-Chris

[Attachment Removed]