Hi there, our CI runs game-build-validation tests after p4 commits.
They run using a development build of the game.
Since moving from UE4.27 to UE5.3 our build-validation tests have been occasionally failing with a memory access violation.
[Image Removed]
Context: This involves some 3rd party code, but i’m mainly seeking guidance and understanding around MTAccessDetector.h
In the dump the main-thread is unregistering/destroying an audio object, while the other thread is ending + unloading an audio object both around a critical section/lock.
Q1.a: This isn’t the normal way an ensure should crash in a development build is it?
Q1.b Has something gone wrong in FCString::GetVarArgs / FMicrosoftPlatformString::GetVarArgs?
Q2.a: How much of an alarm-bell should this race-condition-detector Ensure be setting off for us?
Q2.b Would it be expected that Shipping builds would be fine here the vast majority of the time?
We are on UE5.3, I see these commits ahead of us in /Engine/Source/Runtime/Core/Public/Misc/MTAccessDetector.h
[Image Removed]
Q3: To solve the development-build crashes, do any of these commits disable the crashing/ensures/asserts?
The MTAccessDetector was added in UE5 to help detect race conditions when we started pushing more Game thread work to other threads. Lots of the engine\editor code was expecting serial processing and we didn’t want to add mutex everywhere as that would have been bad for performance. The MTAD fires whenever it detects that multiple threads are hitting the same lock in an unsafe way. This is an imperfect solution as there are lots of cases where the report only has one of the thread correct. Basically, the 2nd thread that tries to lock fires the ensure but the other thread has already exited the unsafe code when the process stops.
Related to your case, it was discovered that the delegates were not thread safe in a couple of ways on top of having referencing issues when using UObject bindings. You should review a couple of later CLs:
Thread safer delegates: 21819214
Add better pinning for UObject: 33370596
My guess is that both threads are manipulating the same multicast delegate in a tight timespan. The Ak thread has likely exited the MTAD and the crash is caused by the PrevState containing invalid values. The first CL adds some locking when the array of delegate methods is being manipulated. It should help with this.
Q1.a: This isn’t the normal way an ensure should crash in a development build is it?
The crash is normally caused by a debug break instruction when a debugger is attached but this doesn’t prevent the crash you are getting from happening if bad things happen at the VARG level.
Q1.b Has something gone wrong in FCString::GetVarArgs / FMicrosoftPlatformString::GetVarArgs?
It’s likely related to once of the method that feeds strings in the VARG
Q2.a: How much of an alarm-bell should this race-condition-detector Ensure be setting off for us?
Any occurrence of the MTAD should be thoroughly investigated it’s detecting concurrent access of code\data that is currently deemed unsafe. It might be safe but it requires validation.
Q2.b Would it be expected that Shipping builds would be fine here the vast majority of the time?
It depends for each occurrence. So far, we have found the Delegates were used in totally unsafe ways and a few other cases that resulted in weird crashes that were really hard to reproduce and debug.
Q3: To solve the development-build crashes, do any of these commits disable the crashing/ensures/asserts?
We do resolve the problems as they surface. We also adjust the code to prevent the triggering of false positives but we are cautious so that they don’t hide bad coding patterns in project code or future code.