Deadlock between FSlateApplication::Tick and FMoviePlayerWidgetRenderer::DrawWindow

FSlateApplication::Tick obtains the slate tick critical section first, then in `PrivateDrawWindows` tries to get the slate renderer resource critical section.

Slate loading thread, FSlateLoadingSynchronizationMechanism::SlateThreadRunMainLoop before calling DrawWindow however gets the render resource critical section first, before calling Tick on the slate application.

The reverse order of obtaining the locks here combined with two threads running causes a race condition, and if the main thread gets the slate tick lock first while the slate loading thread has just obtained the reading lock will cause a deadlock since the main thread can now no longer obtain the render resource one.

Steps to Reproduce
We hit this deadlock in our code where `FSlateApplication::Get().Tick()` was explicitly called, but I suspect this can happen organically in different ways as well.

This is in code that we derived from the Lyra loading screen manager, specifically in ULoadingScreenManager::ShowLoadingScreen, so I suspect this may occur with Lyra as well.

Hi Arnout,

I’ve been trying to hit that deadlock organically and haven’t found a way yet. I was wondering if you could tell me where the explicit Tick call you have is coming from (or what thread does it originate from). There are multiple calls to Tick but only updating the time. The main engine calls it to update the widgets as well (which requires eventually the render resource lock) but this happens in the main thread only in theory.

The key here seems to be where and when the explicit Tick is getting called (to steal the tick lock before the movie rendering thread can get it, but after it already took the render resource lock. So if you could give me more information about how it’s getting called, it would probably help me greatly repro what is going on.

Thanks

Hello Arnout,

What do you do in UpdateWidgetDisplay ? Obviously you eventually call a tick in there, but is there anything else in that file in terms of synchronization objects ? I suspect the timing of that Tick call is why I cannot repro the issue on my side.

Hello Arnout,

Which part of the original ShowLoadingScreen has been moved to UpdateWidgetDisplay ?

Do i understand correctly that the if (bCurrentlyShowingLoadingScreen) return; code we have in lyra has been modified to do something like if (bCurrentlyShowingLoadingScreen) { UpdateWidgetDisplay(); return; } ?

And UpdateWidgetDisplay eventually calls the SlateApplication Tick ?

Thanks Arnout, that gives me a few point of interest in this code. I’ll see if i can repro the deadlock with those differences.

How frequently do you get this issue ?

I was asking about the repro rate because i haven’t been able to repro it once. It could be because of repros, it could be because of the length of the video I’m using but it could also be because of other code that calls the modified version of the code you’re using. The part in the changes that worries me the most is

  1. if (bCurrentlyShowingLoadingScreen)
  2. {
  3. UpdateWidgetDisplay();
  4. return;
  5. }

While i was following the threads to see what it did, I hit that return, which avoids the Tick(). The question however is wether this is the UpdateWidgetDisplay that locks or if it’s the one on first showing the loading screen.

Hi Arnout,

could you try adding

FSlateRenderer* SlateRenderer = SlateApp.GetRenderer();

FScopeLock ScopeLock(SlateRenderer->GetResourceCriticalSection());

above the tick call in your if.

That’s the order the streaming movie is accessing the resources as well.

I don’t know if you’d need the same thing in the other Tick call considering the fact the flow of code can be different in your loading screen manager (as evidenced by the OnPausedForStreamingChanged calling the UpdateLoadingScreen)

Hey Arnout,

don’t get me wrong, i agree with you that having every client get the rendering lock before calling Tick is problematic. However, slate is meant to be single threaded by design. The stream movie loading thread is the single exception and there were measures that were implemented to avoid the issue. The main thread in most case will spend its life in WaitForMovieToFinish while the other thread is doing the updates. That path is taking the rendering resource lock before the Tick one as well.

So the things we need to juggle here is the fact that delegates and blueprint events could be calling any function that goes and get those resources. Tick is public, the function to get the render resource lock is public. The only safe way we would have to avoid the issue is to put the render lock at the beginning of the Tick method so that our thread cannot end up with the situation you’ve found. However, taking the render resource lock at the beginning of Tick like this would probably make the framerate lower for every cases in the application. It becomes then a performance all the time versus race condition with the special movie loading thread/main thread.

Considering you have a workaround and that i haven’t been able to repro it in the unmodified code, i would suggest putting synchronization code like you did for any other calls to slate application Tick you might add.

The problem has been brought to our attention but it will need a lot more development time to feel completely confident about a solution for this.

I thought I attached some callstacks to the original post, but I can’t seem to see them.

Our code here is basically a modified version of the Lyra loading screen so the pattern should be similar. It is worth noting that this is happening to a client that is connecting to a server.

The tick comes from the main thread:

[Inline Frame] Windows::EnterCriticalSection(Windows::CRITICAL_SECTION *) [Inline Frame] FWindowsCriticalSection::Lock() [Inline Frame] FScopeLock::{ctor}(FWindowsCriticalSection *) FSlateApplication::PrivateDrawWindows(TSharedPtr<SWindow,1>) [Inline Frame] FCpuProfilerTrace::FEventScope::{ctor}(unsigned int &, const char *, bool, const char *, unsigned int) [Inline Frame] FSlateApplication::DrawWindows() FSlateApplication::TickAndDrawWidgets(float) FSlateApplication::Tick(ESlateTickType) USDLoadingScreenManager::UpdateWidgetDisplay() USDLoadingScreenManager::ShowLoadingScreen() USDLoadingScreenManager::UpdateLoadingScreen() USDLoadingScreenManager::HandlePostLoadMap(UWorld *) [Inline Frame] Invoke(void(USDLoadingScreenManager::*)(UWorld *), USDLoadingScreenManager * &, UWorld * &&) [Inline Frame] UE::Core::Private::Tuple::TTupleBase<TIntegerSequence<unsigned int>>::ApplyAfter(void(USDLoadingScreenManager::*)(UWorld *) &, USDLoadingScreenManager * &, UWorld * &&) TBaseUObjectMethodDelegateInstance<0,USDLoadingScreenManager,void __cdecl(UWorld *),FDefaultDelegateUserPolicy>::ExecuteIfSafe(UWorld *) [Inline Frame] TMulticastDelegateBase<FDefaultDelegateUserPolicy>::Broadcast(UWorld *) TMulticastDelegate<void __cdecl(UWorld *),FDefaultDelegateUserPolicy>::Broadcast(UWorld *) [Inline Frame] UEngine::LoadMap::__l2::FPostLoadMapCaller::Broadcast(UWorld *) UEngine::LoadMap(FWorldContext &, FURL, UPendingNetGame *, FString &) UEngine::Browse(FWorldContext &, FURL, FString &) UEngine::TickWorldTravel(FWorldContext &, float) UGameEngine::Tick(float, bool) FEngineLoop::Tick() [Inline Frame] EngineTick() GuardedMain(const wchar_t *) GuardedMainWrapper(const wchar_t *) LaunchWindowsStartup(HINSTANCE__ *, HINSTANCE__ *, char *, int, const wchar_t *) WinMain(HINSTANCE__ *, HINSTANCE__ *, char *, int) [Inline Frame] invoke_main() __scrt_common_main_seh() And this is the other thread:

[Inline Frame] Windows::EnterCriticalSection(Windows::CRITICAL_SECTION *) [Inline Frame] FWindowsCriticalSection::Lock() [Inline Frame] FScopeLock::{ctor}(FWindowsCriticalSection *) FSlateApplication::Tick(ESlateTickType) FMoviePlayerWidgetRenderer::DrawWindow(float) FSlateLoadingSynchronizationMechanism::SlateThreadRunMainLoop() FSlateLoadingThreadTask::Run()

Our code is modified from Lyra, I’m not the author, but from looking at it it was mainly done to support a movie player during loading screens as well. However, that path doesn’t force the tick above so I know that logic won’t actually be called.

For this movie player code, some of the logic in Lyra’s ShowLoadingScreen got moved into UpdateWidgetDisplay. One key difference I can see is that in the `bCurrentlyShowingLoadingScreen` call we call UpdateWidgetDisplay, to support transitioning from a blocking load to non-blocking and vice versa. I don’t recall if when I got the issue the update was called from there. And it looks like there was an additional call to `RemoveWidgetFromViewport` added before the call to `AddViewportWidgetContent` - for the same transition reason.

None of those add a synchronisation object though afaik.

Hey, figured it is easier if I just provide the code for the two methods. This shows the workaround we currently have in place:

`void USDLoadingScreenManager::ShowLoadingScreen()
{
if (bCurrentlyShowingLoadingScreen)
{
UpdateWidgetDisplay();
return;
}

TimeLoadingScreenShown = FPlatformTime::Seconds();

bCurrentlyShowingLoadingScreen = true;

CSV_EVENT(SDLoadingScreen, TEXT(“Show”));

UE_LOG(LogSDLoadingScreen, Log, TEXT(“%s”), *DebugReasonForShowingOrHidingLoadingScreen);

// Eat input while the loading screen is displayed
StartBlockingInput();

LoadingScreenVisibilityChanged.Broadcast(/bIsVisible=/ true);

UpdateWidgetDisplay();

ChangePerformanceSettings(/bEnableLoadingScreen=/ true);
}

void USDLoadingScreenManager::UpdateWidgetDisplay()
{
UGameInstance* const LocalGameInstance = GetGameInstance();
const USDLoadingScreenSettings* const Settings = GetDefault();

if (!LoadingScreenWidget.IsValid())
{
TSharedPtr UserWidget = OnLoadingScreenCreateWidget.IsBound() ? OnLoadingScreenCreateWidget.Execute() : TSharedPtr();
if (UserWidget.IsValid())
{
LoadingScreenWidget = UserWidget;
}
else
{
UE_LOG(LogSDLoadingScreen, Error, TEXT(“No custom loading screen widget was provided, falling back to placeholder.”));
LoadingScreenWidget = SNew(SThrobber);
}
}

IGameMoviePlayer* const MoviePlayer = IsMoviePlayerEnabled() ? GetMoviePlayer() : nullptr;

const bool bIsGameThreadBlocked = bCurrentlyInLoadMap || bCurrentlyPausedForStreaming;

const EWidgetMode NewWidgetMode = (bIsGameThreadBlocked && MoviePlayer != nullptr)
? EWidgetMode::MoviePlayer
: EWidgetMode::Viewport;

if (NewWidgetMode == WidgetMode)
{
return;
}

RemoveWidgetFromViewport();

WidgetMode = NewWidgetMode;

if (WidgetMode == EWidgetMode::MoviePlayer)
{
const auto DPIScaler = SNew(SDPIScaler)
.DPIScale_UObject(this, &ThisClass::CalcDPIScale)
[
LoadingScreenWidget.ToSharedRef()
];

const auto Overlay = SNew(SOverlay)

  • SOverlay::Slot()
    .HAlign(HAlign_Fill)
    .VAlign(VAlign_Fill)
    [
    DPIScaler
    ];

FLoadingScreenAttributes LoadingScreenAttributes;
LoadingScreenAttributes.WidgetLoadingScreen = Overlay;

MoviePlayer->SetupLoadingScreen(MoveTemp(LoadingScreenAttributes));
MoviePlayer->PlayMovie();
}
else
{
// Add to the viewport at a high ZOrder to make sure it is on top of most things
UGameViewportClient* const GameViewportClient = LocalGameInstance
? LocalGameInstance->GetGameViewportClient()
: nullptr;
if (GameViewportClient)
{
GameViewportClient->AddViewportWidgetContent(LoadingScreenWidget.ToSharedRef(), Settings->LoadingScreenZOrder);
}

if (!GIsEditor || Settings->ForceTickLoadingScreenEvenInEditor)
{
// Workaround for deadlock caused by the slate loading thread, when showing a loading
// movie, obtaining the resource critical section before the slate tick
// see [Content removed]
FSlateRenderer* MainSlateRenderer = FSlateApplication::Get().GetRenderer();
FScopeLock ScopeLock(MainSlateRenderer->GetResourceCriticalSection());

// Tick Slate to make sure the loading screen is displayed immediately
FSlateApplication::Get().Tick();
}
}
}`

I really started to notice it when I was running multiplayer functional tests (3 clients connecting to a separate dedicated server) at higher intervals. At that point, probably multiple times a day. But I didn’t pay too close attention to it as I was dealing with multiple issues that contributed to client connection problems - this just ended up being one of them. I can remove our workaround and run our tests a bunch more times to see if I can get a better sense of repro rate?

Repro rate, I removed our temporary fix and I got the issue on the second time I ran our multiplayer tests (which is a dedicated server with 3 clients connecting) - one of the three clients hit it on the second run.

The UpdateWidgetDisplay that the main thread is in is the one you posted, in the early return.

Note that this code-path does not avoid the tick:

[Inline Frame] Windows::EnterCriticalSection(Windows::CRITICAL_SECTION *) [Inline Frame] FWindowsCriticalSection::Lock() [Inline Frame] FScopeLock::{ctor}(FWindowsCriticalSection *) FSlateApplication::PrivateDrawWindows(TSharedPtr<SWindow,1>) [Inline Frame] FCpuProfilerTrace::FEventScope::{ctor}(unsigned int &, const char *, bool, const char *, unsigned int) [Inline Frame] FSlateApplication::DrawWindows() FSlateApplication::TickAndDrawWidgets(float) FSlateApplication::Tick(ESlateTickType) USDLoadingScreenManager::UpdateWidgetDisplay() USDLoadingScreenManager::ShowLoadingScreen() USDLoadingScreenManager::UpdateLoadingScreen() USDLoadingScreenManager::OnPausedForStreamingChanged(const bool) TBaseUObjectMethodDelegateInstance<0,USDLoadingScreenManager,void __cdecl(bool),FDefaultDelegateUserPolicy>::ExecuteIfSafe(bool) [Inline Frame] TMulticastDelegateBase<FDefaultDelegateUserPolicy>::Broadcast(bool) TMulticastDelegate<void __cdecl(bool),FDefaultDelegateUserPolicy>::Broadcast(bool) [Inline Frame] Invoke(void(USDLoadingScreenEngineSubsystem::*)(), USDLoadingScreenEngineSubsystem * &) [Inline Frame] UE::Core::Private::Tuple::TTupleBase<TIntegerSequence<unsigned int>>::ApplyAfter(void(USDLoadingScreenEngineSubsystem::*)() &, USDLoadingScreenEngineSubsystem * &) TBaseUObjectMethodDelegateInstance<0,USDLoadingScreenEngineSubsystem,void __cdecl(void),FDefaultDelegateUserPolicy>::Execute() [Inline Frame] TDelegate<void __cdecl(void),FDefaultDelegateUserPolicy>::Execute() UWorld::BlockTillLevelStreamingCompleted() [Inline Frame] Invoke(void(UWorldPartition::*)(), UWorldPartition * &) [Inline Frame] UE::Core::Private::Tuple::TTupleBase<TIntegerSequence<unsigned int>>::ApplyAfter(void(UWorldPartition::*)() &, UWorldPartition * &) TBaseUObjectMethodDelegateInstance<0,UWorldPartition,void __cdecl(void),FDefaultDelegateUserPolicy>::ExecuteIfSafe() [Inline Frame] TMulticastDelegateBase<FDefaultDelegateUserPolicy>::Broadcast() TMulticastDelegate<void __cdecl(void),FDefaultDelegateUserPolicy>::Broadcast() AWorldSettings::NotifyBeginPlay() AGameStateBase::HandleBeginPlay() UWorld::BeginPlay() UEngine::LoadMap(FWorldContext &, FURL, UPendingNetGame *, FString &) UEngine::Browse(FWorldContext &, FURL, FString &) UGameInstance::StartGameInstance() UGameEngine::Start() FEngineLoop::Init() [Inline Frame] EngineInit() GuardedMain(const wchar_t *) GuardedMainWrapper(const wchar_t *) LaunchWindowsStartup(HINSTANCE__ *, HINSTANCE__ *, char *, int, const wchar_t *) WinMain(HINSTANCE__ *, HINSTANCE__ *, char *, int) [Inline Frame] invoke_main() __scrt_common_main_seh()

Hey Alain,

If you look at the code above, we basically already do that in our workaround (which resolves the issue)? Unless I’m misunderstanding the location you’re indicating? That workaround (using the same acquisition order) is _a_ fix for this issue, but you wouldn’t expect everyone to do this before they call Tick on the slate application? This is an internal implementation detail.

(what I don’t know is why the movie player is getting this lock earlier, and/or doesn’t release it before calling tick)

Understandable, will keep our current fix.

We’ll have a think as well as to what may lead to us being able to reproduce this, and will let you know if we get something that does so in stock UE/Lyra.