UE 5.7.x Linux – VK_ERROR_DEVICE_LOST in Vulkan RHI

Hi!

I am consistently experiencing Vulkan device loss on Linux in UE 5.7.x (tested on 5.7.2 and now 5.7.3).
In versions 5.5 and 5.6, using the same hardware and the same project, this issue did not occur.

My system:

  • Arch Linux

  • KDE Plasma 6.5.5

  • KDE Frameworks 6.22.0

  • Qt 6.10.2

  • Kernel 6.18.9-zen1-2-zen (64-bit)

  • Graphics Platform: X11

  • CPU: AMD Ryzen 9 5900X (24 threads)

  • RAM: 128 GB

  • GPU: NVIDIA GeForce RTX 3090

For testing purposes, I installed on a separate drive:

  • the recommended Ubuntu version

  • Rocky Linux

The behavior is identical — the error persists.
I also tested different NVIDIA driver versions (including rollbacks), but the issue remains.

Error

The logs consistently report:

LogVulkanRHI: Error: Result failed, VkResult=-4
with error VK_ERROR_DEVICE_LOST

The crash occurs in:

Runtime/VulkanRHI/Private/VulkanSynchronization.cpp

GPU breadcrumbs indicate execution during:

ShadowDepths
Nanite Shadows
Nanite::DrawGeometry
NodeAndClusterCull

Followed by:

FUnixPlatformMisc::RequestExit(1, FVulkanDynamicRHI.TerminateOnGPUCrash)

Observations

  • Occurs in the Editor during normal workflow

  • Does not require extreme load

  • UE 5.5 and 5.6 were stable on the same setup

  • In 5.7.x crashes are frequent enough that productive work is difficult

  • Reproduced across multiple Linux distributions

This appears to be a regression in Vulkan RHI.

I would greatly appreciate it if Epic could take a look at this issue.
If anyone has found a temporary workaround for the current version, I would be grateful for any suggestions.

I’m not on Arch but I have a similar hardware setup to yours. Rolling the nvidia driver back from 590 (current latest for me)→570 seemed to help.

The engine keeps crashing on me consistently. It might run for 2 hours, sometimes just 2 minutes, or 15 or 40, but it always crashes eventually. Since I lack the technical knowledge to properly diagnose this myself, I had Claude analyze and write the crash report.

To @tginick: Rolling back the NVIDIA driver is not a solution in my case. I have tested drivers 570, 575, 580, and 590 — across three kernel versions (mainline, LTS, and Zen). The crash is 100% reproducible on all combinations. The issue is definitively not driver- or kernel-related.

New crash scenario identified

In addition to the previously reported VK_ERROR_DEVICE_LOST during Nanite rendering, I have now captured a second, more specific crash scenario that happens during Material Editor workflow — specifically when closing a material preview window while another material is open.

Vulkan Validation Layer output (captured with VK_LAYER_KHRONOS_validation)

Running the editor with validation layers enabled immediately exposes the root cause before the crash:

VUID-vkQueueSubmit-pCommandBuffers-00070
vkQueueSubmit(): pSubmits[19].pCommandBuffers[0] — bound VkPipeline
0x83904300002a854b was destroyed.

A command buffer submitted to the GPU still holds a reference to a VkPipeline object that has already been destroyed. This is a use-after-free on the Vulkan object lifecycle side. The GPU cannot execute this command, which produces:

  • NVRM: Xid 69 — Class Error (invalid pipeline handle)
  • NVRM: Xid 32 — channel interrupt (GPU execution stall)

Crash stack trace (SIGABRT)

The crash is not a GPU hang per se — it is the CPU-side render thread deadlocking while waiting for a GPU fence that will never signal (because the GPU stalled on the invalid pipeline):

Signal 6 caught (SIGABRT — abort() called)

FPThreadEvent::Wait()
FRenderCommandFence::Wait()
FFrameEndSync::Sync()
FlushRenderingCommands()
FLinuxWindow::ReshapeWindow()   ← triggered by window resize
SWindow::ResizeWindowSize()
FSlateApplication::DrawPrepass()
FSlateApplication::PrivateDrawWindows()
FSlateApplication::DrawWindows()
FSlateApplication::Tick()
FEngineLoop::Tick()

Sequence of events leading to the crash

  1. Material MLB_HightBlend is saved and compiled in the Material Editor.
  2. Material test is opened in a second editor window.
  3. The first preview window (M_Blend_Inst) is closed.
  4. Slate initiates DrawPrepass, which triggers ReshapeWindow on the remaining window.
  5. FlushRenderingCommands() is called to synchronize the render thread.
  6. The render thread blocks indefinitely in pthread_cond_timedwait waiting for a fence that the GPU will never signal — because a command buffer submitted earlier references an already-destroyed VkPipeline.
  7. The engine calls abort() → SIGABRT → crash.

Why this is a UE 5.7 regression

UE 5.7 introduced changes to the Material Editor and its Vulkan pipeline lifecycle management. It appears there is a race condition where a VkPipeline object is destroyed (as part of closing a preview window / shader recompilation) while an in-flight command buffer in the render thread still holds a reference to it. In UE 5.5 and 5.6, the same workflow on the same hardware is completely stable.

System info

  • UE 5.7.3 (CL-50162420)
  • Arch Linux, GPU: NVIDIA RTX 3090
  • Tested with NVIDIA drivers: 570, 575, 580, 590
  • Tested kernels: mainline, LTS, Zen
  • Tested distros: Arch Linux, Ubuntu (recommended), Rocky Linux
  • Rendering backend: Vulkan

Temporary workaround request

I have tried the following without success:

  • All available NVIDIA driver versions
  • Multiple kernels
  • Multiple Linux distributions

Is there a way to force UE 5.7 to delay pipeline destruction until all in-flight command buffers have completed (e.g., a CVar or engine config flag)? Or any way to force the editor to use a safer pipeline eviction strategy?

The crash is consistently reproducible during normal Material Editor usage, making productive work in UE 5.7 on Linux essentially impossible.

I can reproduce this problem, also on Arch Linux, across UE 5.7.1, 5.7.2, and 5.7.3. The specific message I am getting under 5.7.3 is this:

[2026.02.20-22.03.42:760][814]LogVulkanRHI: Error: Result failed, VkResult=-4
at ./Runtime/VulkanRHI/Private/VulkanSynchronization.cpp:136
with error VK_ERROR_DEVICE_LOST
[2026.02.20-22.03.42:761][814]LogVulkanRHI: Error: Shader diagnostic messages and asserts:

Device: 0, Queue Graphics:
	No shader diagnostics found for this queue.

[2026.02.20-22.03.42:761][814]LogVulkanRHI: Error:
DEVICE FAULT REPORT:

Description:

Address Info:

Vendor Info:

Vendor Binary Size: 0

[2026.02.20-22.03.42:761][814]LogRHI: Error: Active GPU breadcrumbs:

Device 0, Pipeline Graphics: (In: 0x8018e768, Out: 0x8018e769)
	No breadcrumb nodes found for this queue.

When I first encountered this issue on 5.7.1, i found some sources online suggesting it was a power profile problem with the Nvidia driver. The recommended workaround was to change PowerMizer profile to “prefer maximum performance” and to add a udev rule to make that change persistent across restarts (apparently it’s not persistent if changed in the Nvidia Settings applet).

That workaround seemed to work for me under 5.7.2, but I just installed 5.7.3 and opened a minimal level, and the crash happened about 3 or 4 minutes later. I don’t recall exactly the previous error, so this may be identical or slightly different, but the overall behavior is the same.

The errors started occurring for me a couple of Nvidia driver versions ago. I’m currently at 590.48.01.

One data point I can contribute: I can rule out Optimus entirely. I have a Quadro RTX5000 (mobile), and I have switched graphics disabled in firmware/BIOS settings. It doesn’t even show up as a device to the kernel.

I may have a workaround for this; I am testing it today.

Several days ago I updated my local full source repo to 5.7.3 and then did some collaborative diagnostics with Claude Code in that repo. I combined observations from this thread (by @mirthost) with my own test results and Claude’s ability to rapidly cross-reference different parts of the code base. After all that, I believe we have a workaround.

Quick Workaround

Edit your project’s Config/DefaultEngine.ini to add or change the following:

[/Script/Engine.RendererSettings]
; r.Vulkan.WaitForIdleOnSubmit=1
r.Vulkan.EnablePipelineLRUCache=1

Having one line commented out is not a typo. For most people, enabling the LRU (least recently used) pipeline cache is sufficient to prevent the error with minimal performance impact. If that doesn’t work for you, also enable r.Vulkan.WaitForIdleOnSubmit to fully serialize CPU/GPU communication and eliminate the race condition — but that option imposes a much larger performance penalty and should only be used if necessary.

I had a project reproducing the error within about two minutes of starting the editor at idle, and crashing immediately if I tried to debug a PCG graph. With both INI flags enabled, the crashes stopped. I then commented out WaitForIdleOnSubmit and restarted — it remains stable. So the lighter-weight setting alone is sufficient for my case.

Technical Analysis

Credit for the root cause analysis goes to @mirthost (who ran Vulkan validation layers and identified the exact violation) and Claude Code (Anthropic’s AI coding assistant, which traced the bug to specific lines in the engine source). I contributed the observation that the crash correlated with editor cleanup/GC activity and the idea of looking for CVARs as a workaround to avoid rebuilding the engine from source.

Running with VK_LAYER_KHRONOS_validation catches the specific violation:

VUID-vkQueueSubmit-pSubmits[19].pCommandBuffers[0] — bound VkPipeline
0x83904300002a854b was destroyed.

This is a use-after-free of VkPipeline handles. The CPU destroys a pipeline object while the GPU still has in-flight command buffers referencing it. This triggers NVRM: Xid 69 (invalid pipeline handle) → VK_ERROR_DEVICE_LOST → render fence never signals → CPU deadlock → SIGABRT.

The bug is in Engine/Source/Runtime/VulkanRHI/Private/VulkanPipeline.cpp, in the function NotifyDeletedGraphicsPSO(). On PC/Linux, the LRU pipeline cache is disabled by default. When a PSO’s (pipeline state object’s) reference count hits zero — triggered by material recompilation, level unloading, garbage collection, editor window close, PCG shader invalidation, or similar events — this function calls:

(*Contained)->DeleteVkPipeline(true);   // line ~2502: immediate vkDestroyPipeline
VkPSO->DeleteVkPipeline(true);          // line ~2515: immediate vkDestroyPipeline

The true argument bypasses the engine’s deferred deletion queue and calls vkDestroyPipeline() immediately, regardless of whether the GPU is still executing commands that use that pipeline.

When the LRU cache is enabled (via the INI setting above), PSO deletion instead goes through LRURemove(), which checks whether the pipeline was used within the last 3 rendered frames. If it was recently used, it calls DeleteVkPipeline(false) — enqueuing the handle in FDeferredDeletionQueue2 for destruction only after the GPU has finished with it. That safety mechanism already exists in the codebase; the default non-LRU path simply doesn’t use it.

For developers who build from source, the proper two-line fix in VulkanPipeline.cpp is to change true to false at both call sites above, so all PSO destruction routes through the deferred deletion
queue. The CPU-side handle is still cleared immediately; only the actual vkDestroyPipeline call is deferred until the GPU is done.

Forum Notes

For this forum post, I wrote the workaround procedure and test results, Claude Code wrote the technical analysis, and I final-edited the merged post.

I have examined the code change proposed by Claude Code and consider it sensible, but I have not personally tested it because the INI file changes are sufficient for me. Use at your own risk.

EpicGames, this is disgusting. Bring back OpenCL. Vulkan in UI is a disaster. I couldn’t find any workarounds; the project crashes simply because the wire of node is touched. My project has been stuck for over six months.
This happens on different versions of NVIDIA drivers, and it doesn’t matter whether it’s Wayland or x11 (hyprland/bspwm tested on Arch)

The release notes for 5.7.4 do not mention this bug, so it probably has not been addressed. As soon as I have time to install 5.7.4, I’ll test the CVARs workaround I posted. The workaround still has a perfect record for me in 5.7.3, with no crashes since I applied the change.

Hello!

Just came across this forum post and tried out the both the workaround proposed by @mirthost and the patch by @syscrusher but they were both ineffective. I still get those VK_ERROR_DEVICE_LOST crashes unfortunately:

Error        LogVulkanRHI              Result failed, VkResult=-4
Error        LogVulkanRHI               at ./Runtime/VulkanRHI/Private/VulkanSynchronization.cpp:136 
Error        LogVulkanRHI               with error VK_ERROR_DEVICE_LOST
Error        LogVulkanRHI              Shader diagnostic messages and asserts:
Error        LogVulkanRHI              
Error        LogVulkanRHI              	Device: 0, Queue Graphics:
Error        LogVulkanRHI              		No shader diagnostics found for this queue.
Error        LogVulkanRHI              DEVICE FAULT REPORT:
Error        LogVulkanRHI              * Description: 
Error        LogVulkanRHI              * Address Info: 
Error        LogVulkanRHI              * Vendor Info: 
Error        LogVulkanRHI              * Vendor Binary Size: 0
Error        LogRHI                    Active GPU breadcrumbs:
Error        LogRHI                    
Error        LogRHI                    	Device 0, Pipeline Graphics: (In: 0x8001ac57, Out: 0x8001ac58)
Error        LogRHI                    		No breadcrumb nodes found for this queue.

I’ll also investigate on my end because this hinders my ability to work on projects reliably… unless if I go back to 5.6 IIRC.

I’m getting a similar error, but on 5.5. It happens consistently when I try to resize a window. I’m on KDE and Wayland with nvidia 595 drivers.

I also have a similar error on UE 5.3.2 using Linux Mint (Nvidia 590 drivers). Going back to 580 helped for me

Thank you so much! Had the exact same issue, on UE 5.4.4. I’m on 580.142, and I didn’t have a single crash yet

The issue is still present in UE 5.8 Preview: VK_ERROR_DEVICE_LOST.

My specs:

Laptop: Avell A52 Mob
CPU: Intel i5-11400H
GPU: NVIDIA RTX 3050 (4 GB VRAM)
RAM: 16 GB DDR4 3200 MHz (Dual Channel)
Storage: 2× 512 GB NVMe SSD

Operating System: Linux Mint 22.3
Kernel: Linux 6.17.0-35-generic
Architecture: x86-64

On the same laptop, Windows 11 25H2 runs UE 5.7 without any issues. The problem only occurs under Linux.

[2026.06.03-20.21.21:685][519]LogVulkanRHI: Error: Result failed, VkResult=-4
at ./Runtime/VulkanRHI/Private/VulkanSynchronization.cpp:136
with error VK_ERROR_DEVICE_LOST
[2026.06.03-20.21.21:685][519]LogVulkanRHI: Error: Shader diagnostic messages and asserts:

Device: 0, Queue Graphics:
	No shader diagnostics found for this queue.

Device: 0, Queue AsyncCompute:
	No shader diagnostics found for this queue.

[2026.06.03-20.21.21:685][519]LogVulkanRHI: Error:
DEVICE FAULT REPORT:

Description:

Address Info:

Vendor Info:

Vendor Binary Size: 0

[2026.06.03-20.21.21:685][519]LogRHI: Error: Active GPU breadcrumbs:

Device 0, Pipeline Graphics: (In: 0x80006b94, Out: 0x80006b93)
	No breadcrumb nodes found for this queue.

Device 0, Pipeline AsyncCompute: (In: 0x80006b92, Out: 0x80006b92)
	No breadcrumb nodes found for this queue.

[2026.06.03-20.21.21:686][519]LogCore: FUnixPlatformMisc::RequestExit(1, FVulkanDynamicRHI.TerminateOnGPUCrash)

My UE 5.8 installation behaves absolutely the same.

I don’t even need a project creation. It can crash in Home Screen or in Project Creation Window (more often). Log is the same.

OS: Arch Linux x86_64 
Kernel: 6.18.34-1-lts 
Uptime: 2 hours, 46 mins 
Packages: 1495 (pacman), 21 (flatpak) 
Shell: bash 5.3.9 
Resolution: 3440x1440 
DE: Plasma 6.6.5 
WM: KWin 
Theme: Orchis [GTK2/3] 
Icons: Papirus-Dark [GTK2/3] 
Terminal: konsole 
Terminal Font: Noto Sans Mono 14 
CPU: 11th Gen Intel i5-11600 (12) @ 4.800GHz 
GPU: NVIDIA GeForce RTX 4060 Ti 16GB 
Memory: 7872MiB / 48019MiB 

GFX driver is nvidia-open-lts 1:610.43.02-2 but it’s the same on 590 and 580

Same with UE5.7. UE on Linux is completely unusable for me.

PS: Don’t forget to submit a bug report https://www.unrealengine.com/support/report-a-bug I don’t think they care about forum discussions at all.

Already fixed my 5.7.x problems, the fix was also in a future commit the wrong semaphore flag was being used.

5.8.0 error, same problem as reported:

radv/amdgpu: The CS has been cancelled because the context is lost. This context is guilty of a soft recovery.
radv: GPUVM fault detected at address 0x800104a000.
GCVM_L2_PROTECTION_FAULT_STATUS: 0x701430
CLIENT_ID: (SQC (data)) 0xa
MORE_FAULTS: 0
WALKER_ERROR: 0
PERMISSION_FAULTS: 3
MAPPING_ERROR: 0
RW: 0
[2026.06.13-11.36.00:234][ 0]LogVulkanRHI: Error: VulkanRHI::vkQueueSubmit(Queue, InSubmitInfos.Num(), InSubmitInfos.GetData(), FenceHandle) failed, VkResult=-4
at ./Runtime/VulkanRHI/Private/VulkanQueue.cpp:486
with error VK_ERROR_DEVICE_LOST
[2026.06.13-11.36.00:234][ 0]LogUObjectHash: Compacting FUObjectHashTables data took 0.92ms
[2026.06.13-11.36.00:236][ 0]LogVulkanRHI: Error: Shader diagnostic messages and asserts:
Device: 0, Queue Graphics:
No shader diagnostics found for this queue.
Device: 0, Queue AsyncCompute:
No shader diagnostics found for this queue.
[2026.06.13-11.36.00:236][ 0]LogVulkanRHI: Error:
DEVICE FAULT REPORT:

  • Description: A GPUVM fault has been detected

  • Address Info: - VK_DEVICE_FAULT_ADDRESS_TYPE_READ_INVALID_EXT : 0x000000800104A000 (range:0x000000800104A000-0x000000800104AFFF)

  • Vendor Info:

  • Vendor Binary Size: 0 [2026.06.13-11.36.00:236][ 0]LogRHI: Error: Active GPU breadcrumbs: Device 0, Pipeline Graphics: (In: 0x8000001e, Out: 0x80000020) (ID: 0x80000020) [ Active] RenderGraphExecute - UpdateAllPrimitiveSceneInfos (ID: 0x8000001b) [ Active] UpdateAllPrimitiveSceneInfos (ID: 0x8000001c) [ Active] GPUSceneUpdate (ID: 0x8000001d) [ Active] GPUScene.Update (ID: 0x8000001e) [ Active] UpdateGPUScene NumPrimitiveDataUploads 4 Device 0, Pipeline AsyncCompute: (In: 0x80000020, Out: 0x80000020) No breadcrumb nodes found for this queue. [2026.06.13-11.36.00:236][ 0]LogCore: FUnixPlatformMisc::RequestExit(1, FVulkanDynamicRHI.TerminateOnGPUCrash)

Spec: Arch Linux 48GB RX6600 Rysen9 Mesa26

I’m going to try rewind some commits to see where it was introduced.