it seems like the code in RefreshSamplerStatesCallback() can overlap with FillUniformBuffer when it mutates the Wrap_WorldGroupSettings and Clamp_WorldGroupSettings FSharedSamplerState objects. Adding a wait like this
FUniformExpressionCacheWait
ENQUEUE_RENDER_COMMAND(FUniformExpressionCacheWait)([](FRHICommandListImmediate& RHICmdList)
{
FUniformExpressionCacheAsyncUpdateScope::WaitForTask();
});
at the beginning of the if(bRefreshSamplerStates) block seems to avoid the issue, however we are not 100% sure this is the actual fix since the issue is difficult to reproduce and timing dependent.
Is this an issue Epic is aware of?
Thanks,
Lucas
[Attachment Removed]
Steps to Reproduce
Rapidly and repeatedly change r.MipMapLODBias while in-game in a complex scene
[Attachment Removed]
Hi Lucas,
On paper, the fix you are proposing might be ok, but before we go down the road of verifying it, I want to know why and how fast you are “rapidly” changing r.MipMapLODBias? You might be running into a race condition here because we never tested this use case you are describing.
Cheers,
Tim
[Attachment Removed]
Hi Lucas,
I’m sorry for not getting back to you sooner. I am still trying to chase down a good answer for you. As to your questions about stress testing, I cannot say that we do this extreme of a test on our inputs, since that would have exposed this issue to us as well. If you could isolate the test you do into a small repro project, I am going to try poking the dev team again to get you a proper answer faster. Let me know if that is possible for you.
[Attachment Removed]
Hi Lucas,
So I tried to reproduce this issue last week, but, like you, I was unable to produce a crash. The problem is unfortunately too timing sensitive, and without a reliable repro, I cannot file a bug report to get someone to take a closer look. Unless we get a more reliable repro, you might need to use the workaround you already have in place. Please let me know if that works for you, and I apologize for not finding a better solution at the moment.
[Attachment Removed]
Hi Lucas, I am still trying to chase down an answer for you, but it’s going to take a bit longer than expected. I apologize for the delay, and I will update you as soon as I have more information.
[Attachment Removed]
Hi Lucas, so I finally got some traction on the ticket. The code you referenced in UnrealEngine.cpp is unfortunately quite ancient, which has made it difficult to find an owner. However, the owner is out until next week, so I went ahead and created a Jira to review the change you proposed, with the intent also to integrate it into the engine. You can find the public tracker for it here: https://issues.unrealengine.com/issue/UE\-369233\. Please let me know if these steps are sufficient for you or if you need any further help.
[Attachment Removed]
No problem. I see that the public issue tracker link is now live so that I will close out this case. If you haven’t seen an update on the issue after a reasonable amount of time, feel free to reach out again so I can poke the dev in charge of this case.
[Attachment Removed]
Hi Tim,
Yeah, it is a strange use case. 
We have an in-game settings menu which exposes a UI slider control which sets UGameUserSettings::SetOverallScalabilityLevel and then calls UGameUserSettings::ApplyNonResolutionSettings. This renders on top of the normal game view, although the world is paused. We have reports from our QA that mashing left and right continually on that control so that it switches between scalability levels each frame cause crashes. I’ve found this to fairly easy to repro if you mash it for more than 10 seconds or so locally (of course like everything timing related this depends on your PC/content etc.)
This was another similar crash we found in this test which we also reported: [Content removed]
We also found cherrypicking Epic CL 47365520 helped, although it still eventually deadlocks as the index wraps around due to so many FVirtualTextureTranscodeCache::FTaskEntry’s being added
I’m working down the list of issues we saw in order of reproduction rate doing this test. Do you guys do a similar stress test internally? Very rarely we also see a crash due to a FD3D12StateCache containing a non-null SRV with a null resource, however that one is more elusive and doesn’t repro in Development or with the RHI thread off, so it’s difficult to debug.
[Attachment Removed]
Thanks. I can try to make a sample project, but likely you need a complex scene with lots of content in it to repro this I suspect.
[Attachment Removed]
I tried a Test build of Lyra on 5.7.3, the Lyra scalability menu is similar, you can easily rapidly cycle via the DPad as the game renders in the background. I can’t get it to crash but this seems to be timing dependent, and the Lyra scene is much lighter than ours in terms of number of objects, materials, etc.
[Attachment Removed]
Thanks, yeah understandable. That said though I think it could be helpful for somebody at Epic who is familiar with this code to look at it with this context in mind.
I’m not sure my fix is the correct one, but I think the theoretical issue (tasks reading FSharedSamplerState being in flight when they are changed) is there. Even if you aren’t able to reproduce a crash, adding code to prevent that from being possible, or even just asserts, (even if in practice is usually doesn’t happen for other reasons) might solve other related low-repro rate issues.
[Attachment Removed]
Thanks for following up on this. I think on our end the change at least stopped the frequent reproduction of the crash, so we have what we need for now pending a more complete fix from Epic (or confirmation that this is the right fix)
[Attachment Removed]