GPU Crash: MMU Fault Error during a GPU memory Read

So in the packaged Shipping build of my game, users have been reporting random GPU crashes. I have collected the nvidia ueaftermath files and been opening them in NVIDIA Nsight Graphics and they all have the same message:

MMU Fault Error during a GPU memory Read at address 0x0000000000000000 in this shader location:
Closest Hit (MaterialCHS) Shader [ray_tracing_01 @ 0x000040d0 ](Shader1)[![](:/NV_UI/ExternalLink.png)](Shader1)

A shader instruction caused an MMU fault when accessing memory.
This can be caused by shader bugs and binding setup issues, or possibly by a shader compiler bug or shader microcode corruption.

Access originating from Graphics Processing Cluster failed with the following error:
Failed to translate the virtual address.
A MMU fault in the Graphics Processing Cluster may indicate texture fetch or other shader memory issues.

Not sure how on earth to find the material causing this or begin debugging it. I haven’t written any custom shader code, only UE mats. Also the crash doesn’t happen upon launch or going into a specific area of the map, just randomly over time.

EDIT: My rendering settings are:

Notably I have ray tracing enabled so maybe related to this

1 Like

I’m on 5.5.3 and I had this constant crash in my world partition map. What stopped the constant crash from happening :

r.RayTracing.Nanite.Mode=1
r.RayTracing.Shadows.AvoidSelfIntersectionTraceDistance=10

Not sure about the number on avoid self intersection to be honest.

Crash description: in a packaged game or in-editor when walking in a specific area, my game would crash . I use World Partition, maybe I have a mesh that has intersecting geometry and the raytracing wasn’t behaving well without enabling raytracing on nanite meshes?

The only workaround would be to play the game in a tiny window.

Seemingly pertinent Crash log (honestly new to GPU profiling):

LogD3D12RHI: Error: GPU crash detected:
	- Device 0 Removed: DXGI_ERROR_DEVICE_HUNG
  {
    "Active Warps": [
      {
        "GPU PC Address": "ray_tracing_02 @ 0x00000f10",
        "Shader mapping": null,
        "Warp count": 1
      }
    ]
  },
  {
    "Faulted Warps": [
      {
        "Fault Description": "A shader instruction caused an MMU fault when accessing memory.\nThis can be caused by shader bugs and binding setup issues, or possibly by a shader compiler bug or shader microcode corruption.",
        "Fault Name": "MMU Fault Error",
        "Shader GPU PC Address": "ray_tracing_02 @ 0x00000ec0",
        "Shader mapping": null
      }
    ]
  },

If I read the GPU section LogRHI: Error: Active GPU breadcrumbs: of my log file right, the frame info (“Pipeline Graphics”) seemed to be pointing to Raytracing, Nanite, Postprocess compositeDebugPrimitives. In the “AsyncCompute” section it seemed to point to Post process or Lumen scene update.

This feels more like a band-aid and I would like to know more and what was the root cause. Hoping for more replies in this thread!