We are running into a similar error/crash to that in the question [Out of range streaming request on [Content removed] where we are occasionally hitting a RangeCheck error in FStreamingManager::AddParentNewRequestsRecursive(), because the PageIndex is out of range for Resources.PageStreamingStates. It’s not giving crazy values, just things like “index 2 out of size 1”, or “index 7 out of size 4”.
This started being reported shortly (not immediately as on the next build, but within 2 working days) of us putting in a workaround for a different Nanite streaming issue - we were getting Nanite geometry constantly flickering between higher and lower LODs in some areas (including the same spot on the map that is the most reliable repro for this crash). It seemed to be that a SceneCapture component for rendering an offscreen weather mask (which uses Nanite meshes) was conflicting with the main scene render as far as what Nanite LODs were visible, and therefore geometry was constantly streaming in and out.
We found the thread [Content removed] and implemented the workaround suggested there, adding this line to FDeferredShadingRenderer::Render():
(note that the parentheses around the first two terms are not in the original post, but we found them to be necessary)
So far I have a 0% repro rate when the SceneCapture is disabled, which supports the other circumstantial evidence for this change being related. Does this seem plausible? And if so, is there a safer way to deal with the original problem of the SceneCapture and the main scene fighting over Nanite streaming?
Steps to Reproduce
We don’t have an entirely reliable repro case, but the most reliable case involves starting up a multiplayer session, teleporting to a particular location in the world, and running around that area for several seconds until the game crashes.
Just taking a look at this now and I think that issue with the Nanite streamer getting multiple updates per frame when there is a Scene capture was addressed in CL38645292 in UE5/Main.
From the commit message:
>Scene render builder will now provide inputs for the first scene renderer in a series allowing for one-time updates for all renderers.
>- Modified Nanite streaming and Scene only update once.
>- Modified occlusion to use the ‘multi-family’ fencing code path all the time, and deprecated the flag that controls it.
I don’t know how easy this is to pull back to 5.4, but I’m thinking at least following that parameter passing flow for FSceneRenderUpdateInputs* SceneUpdateInputs should work, even if you decide to skip some of the other changes.
Thanks! I looked through that change and I think it makes sense, but I still have one concern - with the SceneUpdateInputs changes, the Nanite streaming updates will always run on the first successful scene rendering pass that runs in the frame. In our case, the scene captures render first, and then the main scene view rendering, so I’m assuming that would mean that Nanite streaming would be based entirely on the feedback from the scene capture and not the main scene, which is not the behavior that we are looking for.
However, when looking through these changes, I’m wondering if the problem causing these crashes is that Nanite::GStreamingManager.BeginAsyncUpdate(GraphBuilder) / Nanite::GStreamingManager.EndAsyncUpdate(GraphBuilder) need to run before any Nanite rendering for the frame in any view/renderer, and with our current change, that is not the case.
Does it make sense to try to find a way to split things so that those functions still get called on the first renderer processed, as in CL38645292, but the call to Nanite::GStreamingManager.SubmitFrameStreamingRequests(GraphBuilder) can happen only on the first “main” renderer to process? I bet I could hack this in pretty ugly with something like:
That’s ugly and not generic/robust enough for general usage, but it probably would be enough to band-aid our particular use case until we get up to 5.6.
Alternatively, we could try control the order in which the renderers are added/processed so that our main scene render always runs first. 5.4 doesn’t have the SceneRenderBuilder yet, though, just a bunch of ENQUEUE_RENDER_COMMANDs, so that seems like it would not be particularly easy.
Hello, just to mention that we have the exact same issue here. From what i see on CL38645292 it seems to have been pushed on 5.6.1 and we are on 5.6.1, but we still crash on the same spot when having scene capture at runtime, we don’t crash anymore when we remove them.
To be more precise we crash on HLOD resources, removing all the HLODs also fix the crash.
Wrt the crash, I have tried merging over two recent fixes for rare crashes in AddParentNewRequestsRecursive or ApplyFixup. I suspect that this scene capture setup might be turning these exceedingly rare conditions into much more common ones.
5.4 version: 45820486
5.6 version: 45775153
Can you give those a go to see if that fixes the crashes?
Wrt the issue with flickering because the residency gets updated independently for Main view and the scene capture, I did have a look at the refactor in CL38645292 and the Main view and Scene capture passes are still separate and and end up both setting SceneUpdateInputs, so for this case it doesn’t actually change anything.
I think conceptually what needs to happen is that we want the streaming update to only happen once per frame. The call to SubmitFrameStreamingRequests(GraphBuilder) should also only happen once per frame after any rendering has happened. I can take a look at that a little later today to see if there is an easy way to do that.
Is the new reshelved version 45830006? I don’t see 45820486 anywhere in P4, but that one looks correct from the description and files included. I’ll grab that and take a look now. Thanks!
EDIT: Is disabling optimization for NaniteStreamingManager.cpp something that’s meant to be left on for now? Or can I comment that out when I bring it into our codebase?
Sorry for the delay, was trying to get this in for testing but was dealing with local broken build issues. I have this in now and I’m just traveling around the world and trying some of the specific situations which were the most consistent repro cases for the crash before. So far, so good.
I haven’t changed anything on the other end of things, though, where it won’t process Nanite streaming on the scene capture passes, only on the main scene pass. That probably should be replaced by something like the SceneUpdateInputs or the test code you had in that other CL so that it happens on the first pass rendered for each frame.
I still don’t have a good repro for the SceneCapture issue, but I have written up a slightly hacky CL for 5.4 that makes sure the streamer only gets called once per frame and the readback buffer contains the data for the whole frame. I suspect that should fix the issue. I have shelved that along with the other fixes in CL45861799. Can you give it a try to see if that resolves the issue?
Ok, on further reflection, I think that is not quite right. As any newly added resources have to be added immediately for things like HierarchyOffset to actually point to valid data. So I think it would have to be something like streaming updates getting processed only once per frame, but every view (or view family) render would still have to check and potentially process any new resources that have been added before rendering.
I’m a little busy right now, but I can try sketching that up a little later.
OK. Does that only affect the 2nd part of your shelf (updating once per frame with the frame counter)? I’ve already submitted the first part that you shelved in 45820486 into our codebase earlier this morning, but I was still testing 45861799 and have not pushed that out yet.
In my limited testing today, the version in 45889455 hasn’t crashed yet in my most reliable repro spot, and also I’m not seeing the original LOD flickering issue from the multiple readback buffers clashing.
Do you feel that this version is safe for me to push out to the rest of the team here to collect more data? Or are you still working on updates?
EDIT: I did have to modify the call to BeginAsyncUpdate() in void FDeferredShadingSceneRenderer::RenderHitProxies(FRDGBuilder& GraphBuilder) to add the frame counter to get the build to compile.
I got a notification that this question is about to be closed out, so final update from my end:
We had 123 crashes in our database for the month before I put in your change, and I see zero since implementing it. So, it seems like this is a good fix. I am not aware at this time of any other side-effects being reported.
We’re starting on an upgrade now directly from 5.4 to 5.6 - is 45775153 still a good CL to integrate for 5.6, or is that out of date? Even if that doesn’t have your latest changes to the 5.4 version, I can likely compare the two shelves and use that to determine how to bring the final version over. Or, I can try just grabbing it from main once you submit it (probably in 5.7 but I can backport it).
45775153 Should still be good for the crash fix. If you combine it with the changes to when the readback happens from 45889455, I think you should be good to go.
I will make sure to get the readback fix reviewed and submitted to main as well. 5.7 is going into hard lock right now, so it will probably just barely miss that cut.
45889455 only has a 5.4 version shelved, and some of the code modified in FStreamingManager::BeginAsyncUpdate() that was modified in that CL doesn’t seem to exist in 5.6 in the same form. For example, there’s no longer any // Init and clear StreamingRequestsBuffer, // Find latest most recent ready readback buffer, or // Lock buffer blocks, but instead there’s new VirtualPageAllocator.Consolidate() and AsyncState.GPUStreamingRequestsPtr = ReadbackManager->LockLatest(AsyncState.NumGPUStreamingRequests) calls.
Is it safe for me to just assume these are equivalent in terms of what gets shuffled around and where? For example, moving VirtualPageAllocator.Consolidate() to right after SubmitFrameStreamingRequests(). Or, is there a 5.6 version of this change available shelve in another changelist I haven’t looked at yet?