5.5.4 - Race condition in FSlateApplication::RegisterNewUser

Hi, we’re seeing a fair number of crashes that seem to be related to a race condition initializing slate users. There’s an engine comment in FSlateApplication::RegisterNewUser that says “New users must be registered at a valid non-negative index that is not already occupied by another user” which makes sense. However, we’re getting a crash here due to the index being -1 after that check has already succeeded (when it attempts to call `Users[UserIndex] = NewUser`), implying that another thread is also modifying that memory.

We have not been able to consistently reproduce it internally, but a fair number of external users have managed to trigger the race condition.

Note that this callstack includes our modification to add support for gyro on steam controller [Content removed] but I do believe this issue is not inherent to that modification. It just provides a more frequent repro for the race condition so that’s the one we see most often in our crash reports. It looks theoretically possible for existing message handler function calls in the various vendor controller plugins to also trigger it if they’re called at just the wrong moment. (Also we attempted to add a scope lock at the top of FSlateApplication::RegisterNewUser with no success, therefore that line number in the callstack is one higher than it would be in vanilla 5.5.4.)

Game thread: [inlined] std::_Atomic_integral<T>::fetch_add (atomic:1507) [inlined] std::_Atomic_integral_facade<T>::fetch_sub (atomic:1723) [inlined] SharedPointerInternals::TReferenceControllerBase<T>::ReleaseSharedReference (SharedPointerInternals.h:222) SharedPointerInternals::FSharedReferencer<T>::operator= (SharedPointerInternals.h:653) [inlined] TSharedPtr<T>::operator= (SharedPointer.h:931) FSlateApplication::RegisterNewUser (SlateApplication.cpp:4398) [inlined] FSlateApplication::RegisterNewUser (SlateApplication.cpp:4364) FSlateApplication::GetOrCreateUser (SlateApplication.cpp:4344) [inlined] FSlateApplication::GetOrCreateUser (SlateApplication.h:1160) FSlateApplication::ProcessMotionDetectedEvent (SlateApplication.cpp:6622) FSlateApplication::OnMotionDetected (SlateApplication.cpp:6604) FSteamController::SendControllerEvents (SteamController.cpp:242) FWindowsApplication::PollGameDeviceState (WindowsApplication.cpp:2860) FEngineLoop::Tick (LaunchEngineLoop.cpp:5861) [inlined] EngineTick (Launch.cpp:69) GuardedMain (Launch.cpp:190) GuardedMainWrapper (LaunchWindows.cpp:123) LaunchWindowsStartup (LaunchWindows.cpp:287) WinMain (LaunchWindows.cpp:327)

Would you have any recommendations for guarding against this situation?

Steps to Reproduce

  1. Call one of the message handler functions (such as OnMotionDetected) that requires a call to FSlateApplication::GetOrCreateUser from device input polling (so it has a chance to trigger at the wrong time when the race condition occurs)
  2. Random chance to crash if the timings line up

Hi,

It doesn’t seem like a race condition could explain things here since UserIndex is a local variable and we aren’t doing anything with it outside of the thread in which RegisterNewUser is called. Perhaps a memory stomp would cause this? You could try running with -stompmalloc to see if it points to anything obvious.

If we *did* have something creating a new user outside of the game thread (you could check(IsInGameThread()) at the top of RegisterNewUser to verify), I can imagine we’d end up with two SlateUsers with the same ID, which would result in one being lost. Presumably that would manifest as a null pointer exception when NewUser is dereferenced (since it’s been replaced by the newer user), but if you’re seeing UserIndex change then that seems extra strange.

Best,

Cody

Hi,

Sorry for the delay responding. It looks like we don’t pass along the explicit user index, but instead call the DeviceId version of the FInputEvent constructor which does the following:

// Set the User Index to the PlatformUser ID by default for backwards compatibility UserIndex = GetPlatformUserId().GetInternalId();GetPlatformUserID tries to find the UserID based on the input device, and that’s where things are failing. I only have an Xbox controller to test with at the moment, but we register it’s mapping in XInputInterface::GetPlatformUserAndDevice:

// If the controller is connected now but was not before, refresh the information if (InDeviceState == EInputDeviceConnectionState::Connected) { DeviceMapper.Internal_MapInputDeviceToUser(OutDeviceId, OutPlatformUserId, InDeviceState); }SonyControllers has some similar logic, but the Steam controller implementation lacks that same call to MapInputDeviceToUser. Instead, we have this:

FPlatformUserId UserId = FPlatformMisc::GetPlatformUserForUserIndex(i); FInputDeviceId DeviceId = INPUTDEVICEID_NONE; IPlatformInputDeviceMapper::Get().RemapControllerIdToPlatformUserAndDevice(i, UserId, DeviceId);Our default here is to use the controller ID as the user ID, but the SteamController implementation is indexing that controller ID based on the number of connected Steam controllers and doesn’t seem to account for additional devices from other sources. It seems like that call to GetPlatformUserForUserIndex is problematic and there may be some cross-contamination between the Steam controller and the PS5 controller in this case.

Overall this looks like it could be a pretty involved fix, as I don’t see any place in SteamController where we get notified of a new controller connection for a UserID and can set up that mapping (though I’m just eyeballing it since I don’t have a controller to test). We seem to just process events as they come without ever doing any sort of initial setup. You could try manually calling Internal_MapInputDeviceToUser as the other platforms do to ensure a mapping is registered on a newly connected controller; perhaps the indexing isn’t as important in this case since we can assume the Steam deck controller will always be user 0 anyway.

Ah, good point. I spoke with our gameplay team and it this *should* be specified via the Device Input Mapping Policy in your config file, though that may be getting into the weeds a bit. You can add some logging to verify you’re seeing the expected user ID from each input device when it handles input, though it sounds like the root cause of this issue is the missing mapping. Have you had any luck with Internal_MapInputDeviceToUser?

Hi,

Glad to hear it might have worked! Let us know if the crash comes up again and we’ll dig back into this one.

Best,

Cody

Sorry for the delayed response, hopping between several tasks at the moment. I did attempt to reproduce it several times with -stompmalloc, but no luck there. I also added check(IsInGameThread()) to RegisterNewUser, but have not hit it yet. Unfortunately I’m not sure what condition specifically causes it to try and increase my odds of reproducing it.

We did find another memory stomp related to a 3rd party plugin extending the analog cursor for gamepad, but it doesn’t look like it had any impact on this issue even after we shipped a patch to external players.

I’ll try to add some more logging right before to see if we can identify what specifically players are experiencing.

Hi,

I may have a lead here, I found a similar crash reported in Fortnite that was related to processing a motion controller event after the controller had been disconnected. Checks are compiled out in shipping builds, so the index returned by PlatformUserId.GetInternalId could have been -1 all along.

We checked in a temporary workaround in CL# 38509598, but reverted it shortly after in favor of a fix to the original source of the bad ID assignment in our Sony controller implementation. If you have access to those platform files, you could look at CL# 38553156 for the fix.

Comparing that fix to your changes in SteamController, I suspect you should provide both a platform ID and a device ID to your calls to OnMotionDetected. You should already have those from the earlier call to RemapControllerIdToPlatformUserAndDevice, so it’s worth giving that a shot to see if it fixes things. You could also add the workaround to filter out the bad events, though it would be best to fix it at the source of the bad ID.

Best,

Cody

Ah, that sounds like it could very possibly be the same issue! I’ll go ahead and cherrypick the Sony controller CL since we use that too and then apply similar changes to the steam controller input events and see if that makes any difference.

Whoops, looks like I had actually already caught that the SteamController.cpp OnMotionDetected modification should be using the overload with both UserId and DeviceId and forgot to update the original EPS post with that change, so our builds currently are doing it that way (though we did not have the SonyControllers.h change, so I’m still cherrypicking that one).

That said, that’s definitely on the right track because I manage to reproduce the issue! The repro is simply to plug in an external controller (I used a PS5 controller, but I assume any will work) into a steam deck. Adding some logging reveals that it correctly calls MessageHandler->OnMotionDetected() with UserId = 1 and DeviceId = 1 (as expected since a new controller has been added in addition to the steam deck’s built in one), but by the time it propagates down to FSlateApplication::GetOrCreateUser, UserId is now -1 (and ultimately hits the check outside of shipping/crashes as originally described in shipping).

Essentially, the flow goes:

`FSlateApplication::OnMotionDetected InputDeviceId.GetId() = 1, PlatformUserId.GetInternalId() = 1

FSlateApplication::ProcessMotionDetectedEvent MotionEvent.GetInputDeviceId().GetId() = 1, MotionEvent.GetPlatformUserId().GetInternalId()) = -1

// Uh oh! GetOrCreateUser reads from InputEvent.GetUserIndex() but the InputEvent created in FSlateApplication::OnMotionDetected only specified the InputDeviceId!
FSlateApplication::GetOrCreateUser UserIndex = -1`So with that knowledge, what is the intended behavior here? That the FMotionEvent is also constructed with a UserIndex? That GetOrCreateUser needs to be changed to not directly call GetUserIndex()? Other?

That’s good information. That said, I believe this is also used with a steam controller on PC so even if steam deck will always have a steam controller for user 0, can we actually assume a steam controller will always be user 0?

Well, from what I’ve got so far, I’ve managed to get it to execute a function for the SteamInputDeviceConnected callback that performs the device mapping. Works fine for user 0.

However, due to the timings of the controller events and the callback execution (since it operates off the game thread), I’m having a bit of a hard time figuring out a way to ensure we run that mapping prior to the crashing bit in SendControllerEvents (so I’m not sure if it actually solves the issue yet). Would you have any ideas as to how we can avoid executing the controller event logic while in the bad state to allow the async callback to complete?

An update: I managed to get something in that seems to prevent the crash, but feels a bit heavy-handed (and possibly open to undiscovered edge cases still). I added a `ControllerHandle_t CachedControllerHandles[STEAM_CONTROLLER_MAX_COUNT];` array which is updated at the end of SendControllerEvents() to match the ControllerHandles array. In the main for loop I just checked if `CachedControllerHandles[i] != ControllerHandle` and if so, set a bool to wait for a connection event (which will be set back to false once we run the game thread logic that calls Internal_MapInputDeviceToUser) before returning out.

Little hard to share the modified code since it’s so spread throughout the file, but here’s my best attempt:

`bSteamControllerInitialized = SteamInput()->Init(bManuallyCallRunFrame);

SteamInput()->EnableDeviceCallbacks(); // ← Added

InputSettings = GetDefault();`

`virtual void SendControllerEvents() override
{
if (!bSteamControllerInitialized || bWaitForConnectionEvent) // ← Added bWaitForConnectionEvent
{``for (int32 i = 0; i < NumControllers; i++)
{
ControllerHandle_t ControllerHandle = ControllerHandles[i];
FControllerState& ControllerState = ControllerStates[i];

// Added
if (CachedControllerHandles[i] != ControllerHandle)
{
// Something changed, wait for new connection event
bWaitForConnectionEvent = true;
// Need to run frame to make sure that we get the connect/disconnect callbacks
SteamInput()->RunFrame();
// Break, don’t return so it will update CachedControllerHandles
break;
}
// End`At the very end of SendControllerEvents:

// Update cache FMemory::Memcpy(CachedControllerHandles, ControllerHandles, sizeof(ControllerHandles));Below the IsGamepadAttached() definition:

`STEAM_CALLBACK(FSteamController, OnSteamInputDeviceConnected, SteamInputDeviceConnected_t);

private:
void OnSteamInputDeviceConnectedGameThread(const int32 ControllerIndex)
{
UE_LOG(LogTemp, Log, TEXT(“Controller %d connected”), ControllerIndex);
FPlatformUserId UserId = PLATFORMUSERID_NONE;
FInputDeviceId DeviceId = INPUTDEVICEID_NONE;

IPlatformInputDeviceMapper& DeviceMapper = IPlatformInputDeviceMapper::Get();
DeviceMapper.RemapControllerIdToPlatformUserAndDevice(ControllerIndex, OUT UserId, OUT DeviceId);

constexpr EInputDeviceConnectionState InDeviceState = EInputDeviceConnectionState::Connected;
// Refresh the information since it was just connected
DeviceMapper.Internal_MapInputDeviceToUser(DeviceId, UserId, InDeviceState);

bWaitForConnectionEvent = false;

UE_LOG(LogTemp, Log, TEXT(“Controller %d mapped to UserId=%d DeviceId=%d”), ControllerIndex, UserId.GetInternalId(), DeviceId.GetId());
}``/** Controller states */
FControllerState ControllerStates[MAX_STEAM_CONTROLLERS];

// Added
/** Cached controller handles to detect bad state */
ControllerHandle_t CachedControllerHandles[STEAM_CONTROLLER_MAX_COUNT];

bool bWaitForConnectionEvent = false;
// End

/** Delay before sending a repeat message after a button was first pressed */
double InitialButtonRepeatDelay;`Just below the FSteamController class, but before the #endif:

`void FSteamController::OnSteamInputDeviceConnected(SteamInputDeviceConnected_t* pCallback)
{
check(SteamInput());
const int ControllerIndex = SteamInput()->GetGamepadIndexForController(pCallback->m_ulConnectedDeviceHandle);

AsyncTask(ENamedThreads::GameThread, this, ControllerIndex {
OnSteamInputDeviceConnectedGameThread(ControllerIndex);
});
}`Do you have any thoughts on the safety of this implementation or ideas to improve it?

Hi,

No concerns on our end, that fix seems reasonable. I’ve captured all of this in a bug report (UE-296729) so we can try and reproduce the crash on our end and see about integrating that fix or something similar. I don’t have a timeline on when that might be reviewed, but I’ll reach out if there’s any news. In the meantime, let us know if you run into any problems with this approach.

Best,

Cody

Thanks, we’ll stick with this modification then for the time being unless something else comes up (or it proves to be unstable).

It may be a bit early to tell, but we released an update with this change and so far have no crash reports that look like the original issue.

An unintended but interesting side effect of this change, Steam users who prevent Steam DLLs from loading (common to circumvent Steam DRM for piracy reasons) now crash when they plug in a controller. I think we’re okay investing no additional effort into that one.