Hello,
we are sorry for the late reply in this post [Content removed]
In the meantime, we have upgraded to the UE 5.5.4 and I started to investigate the GPU crashes again. A lot of GPU crashes seem to be resolved, however some of them still persist.
They are hard to reproduce locally. We are running crash hunt tests over the night, which simulate the player approach through the game to gather various crashes. So far, we are getting about two GPU crashes per night run from about 50 PCs being involved.
The game is started with the following render arguments:
[Image Removed]
and the one version of the following extra arguments is appended:
[Image Removed]
We are having disabled render features like a HW raytracing, mesh shaders, async compute and mega lights. This feature cutoff has resolved a lot of other GPU crashes in the past.
Also, we have merged a several Aftermath improvements and fixes from the 5.6, including custom improvements:
- being able to symbolize and pair the dumped shaders with the Nvidia crash dumps
- added shader names reporting in the Aftermath “Active Shaders” section
- fixed Aftermath resource names reporting
Currently, the most frequent crash has following Breadcrumbs:
- RenderScene -> Scene -> Nanite::VisBuffer -> Nanite::InitContext -> RasterClear
followed by:
- UpdateGlobalDistanceField -> Update MostlyStatic -> CullToClipmaps, GlobalDistanceField.HasPendingStreamingReadback
- Scene -> PrePass DDM_AllOpaque (Forced by Nanite) -> DepthPassParallel -> ParallelDraw
Despite all debugging features above being enabled, only a few GPU reports contain all crash data. A lot of reports are missing the last DRED ops, pagefault data or the Aftermath is not being invoked (even after prolonging the timeout to one minute).
Do you have any clues how to resolve this, please? Is this just a driver dependent, that decides to gather some info and sometimes not?
When we are lucky and get a better GPU report, it doesn’t make much sense. The one report (attached in this post) says, specifically the Breadcrumbs, the crash happened in the UAV clear pass within the Nanite::InitContext pass. However, the code and the shader are so simple, that we don’t see any mistakes around.
[Image Removed]
We were searching for possible Nanite fixes on Github, there are a lot of changes in the NaniteCullRaster.cpp, but none of them mentioned this kind of issue.
But, when we look to the active shaders reported by the Aftermath, it directs us to the completely different passes, which are already finished (based on the Breadcrumbs):
[Image Removed]
We don’t know which reported information is correct and which one is just rough. We never get reports with an active shader for the RasterClear pass, but we’ve got several reports mentioning the DistanceFieldStreaming shaders.
It seems, the UDN doesn’t mention any related problems and the Github changes in the DistanceFieldStreaming.cpp are not very frequent. We are not sure, if this change may resolve it somehow: https://github.com/EpicGames/UnrealEngine/commit/9a57280ffdb410fffbbf9cd81eef0146540f40a6.
Do you have any ideas or suggestions, what may resolve this kind of crashes, please?
Thank you.
Best regards,
Tomas Ruzicka.
[Attachment Removed]