(This is something I’ve already added code locally to fix.)
Over the past two weeks, we have been getting a number of crash dumps posted with different exact callstacks and crash locations, but all consistently somewhere within FSceneRenderer::WaitOcclusionTests(). I spent some time investigating and found that the wait loop was failing to ever reach ViewStateFenceCount <= FencesAllowedInQueue and exit the do/while loop. The FenceToWaitOn keeps getting decremented to negative values, and eventually we crash trying to access whatever happens to be right before that array in the heap. (Explaining why the actual crash is different each time).
I added additional tracking and discovered that between the first loop within that function where ViewStateWaitCount is computed, and the second loop where it waits on the fences, the OcclusionSubmittedFence array can be modified by another thread executing a FenceOcclusionTests RDG task:
[Image Removed]
[Image Removed]
(OcclusionSubmittedFencesBeforeWaitis copied from OcclusionSubmittedFence at the beginning of the function, before we count up ViewStateFenceCount.)
In most situations, this wouldn’t lead to a crash, since the wait loop always starts from the end of the array anyway. Theoretically you could end up exiting the loop before waiting on the new fence 0 if it has the same ViewStateUniqueID as the wait loop. However, it doesn’t seem likely that we’d be adding a fence for a view after we’ve already started waiting on it. For a crash to occur, you’d somehow have to have the timing perfect such that the fence for the correct view being pushed back from index [N] to [N+1] precisely as the wait loop iterates down from [N+1] to [N].
My solution to this for now was to add a critical section with FScopeLocks within the FenceOcclusionTests lambda and WaitOcclusionTests to protect the array, and we have not had any new reports of related crashes since. However, I still do not know why this was occurring ONLY in Shipping configuration builds. Posting here for someone to double-check my work, basically, and ensure there is not something subtle going on that I’m missing.
[Attachment Removed]