Hello,
We’ve been experiencing TDR crashes recently and, despite our best efforts, have been unable to pinpoint what is causing those.
Our only certainty is that it always occur during the NodeAndClusterCull step.
As a band-aid, we increased TDR delays to avoid crashing and to allow the affected team to continue their work, but this still results in 20-30s hangs which is obviously not acceptable in a final product.
I tried playing with the following nanite parameters, which we previously had to crank up in order to solve visual artifacts (displayed values were used without issues even before crashes started to occur) :
r.Nanite.MaxNodes=8388608
r.Nanite.MaxVisiblePatches=8388608
r.Nanite.MaxVisibleClusters=16777216
r.Nanite.MaxCandidatePatches=8388608
r.Nanite.MaxCandidateClusters=67108864
Setting those back to defaults alows to reduce the length of the hang (and as such dodge some TDR crashes), but still most times leads to hangs a tad bit longer that 2s, which are not ideal and would end up triggering the TDR anyway.
Our next step would be to try to progressively remove content from our level to snif out which objects or materials may cause the hang, but prior to commiting to this goose chase, we would have liked to know :
-
Is this an ongoing issue in the engine ? When scouring through UDN, we came across this issue that shared the same GPU breadcrumbs as ours, but we are unsure if this is relevant to our issue since the causes seems to differ.
-
Is there any way we could extend/configure logs in a such way that infringing objects/materials could be directly identified ? (bear in mind we use the launcher version of the engine, as such we tend to avoid solutions that use engine modifications)
Thanks in advance,
Joaquim