ShaderCompileWorker crashing on shutdown when distributed via SN-DBS

We have been seeing sporadic crashes in the ShaderCompileWorker. It seems to mainly happen when compilation jobs are distributed via SN-DBS. I got the first callstack above from a minidump I found on my PC, which I was luckily able to get symbols for. I’ve caught crashes with seemingly identical callstacks several times via JIT debugging, though I haven’t been able to get symbols for these crashes. The crashes are run by “LOCAL_SERVICE”, and sometimes show up in Windows Reliability Monitor, where we can confirm that the executable is being run in a subfolder of `C:\~dbs` with the name of another developers machine.

Looking at the callstack, what appears to be happening is a shared pointer goes out of scope at the end of the program. During the destructor call a check() fails in FTrackedActivityManager::Destroy(). The code looks like this:

void Destroy(FActivity* Activity)
{
    check(Activity->Stack.Num() == 1); @EPS: This check fails!
    {
      FScopeLock _(&ActivitiesCs);
      Activities.Remove(Activity);
      SendEvent(FTrackedActivity::EEvent::Removed, *Activity);
    }
    delete Activity;
}

And the shared pointer is first created here in TrackedActivity.cpp:

FTrackedActivity& FTrackedActivity::GetIOActivity()
{
    static TSharedPtr<FTrackedActivity> A(MakeShared<FTrackedActivity>(TEXT("I/O"), TEXT("Idle"), ELight::None, EType::Activity, 1));
    return *A;
}

Funnily enough, a crash happens internally inside the check(), as the Serialize() function pointer in the log output device’s vtable points to __purecall() . Maybe the output device has already been deinitialized by this point, and that’s why the vtable pointer for FOutputDevice::Serialize is set to __purecall()?

Anyway, it’s unclear to me why the check() fails in the first place. GetIOActivity() is only called very locally inside the struct FActiveAsyncLoadContext, where it is obvious that Push() and Pop() calls are balanced.

Edit: Posting callstacks here as well as they don’t seem to be visible in the post?

Full callstack, with symbols:

  ucrtbase.dll!abort()
  VCRUNTIME140.dll!_purecall() Line 29
  ShaderCompileWorker-Core.dll!FOutputDevice::LogfImpl(const wchar_t * Fmt, ...) Line 81
  [Inline Frame] ShaderCompileWorker-Core.dll!FOutputDevice::Logf(const wchar_t[49] &) Line 246
  ShaderCompileWorker-Core.dll!AssertFailedImplV(const char * Expr, const char * File, int Line, void * ProgramCounter, const wchar_t * Format, char * Args) Line 152
  ShaderCompileWorker-Core.dll!FDebug::CheckVerifyFailedImpl2(const char * Expr, const char * File, int Line, const wchar_t * Format, ...) Line 652
  ShaderCompileWorker-Core.dll!FTrackedActivityManager::Destroy(FTrackedActivityManager::FActivity * Activity) Line 62
  [Inline Frame] ShaderCompileWorker-Core.dll!FTrackedActivity::{dtor}() Line 177
  [Inline Frame] ShaderCompileWorker-Core.dll!DestructItem(FTrackedActivity *) Line 76
  ShaderCompileWorker-Core.dll!SharedPointerInternals::TIntrusiveReferenceController<FTrackedActivity,1>::DestroyObject() Line 424
  [Inline Frame] ShaderCompileWorker-Core.dll!SharedPointerInternals::TReferenceControllerBase<1>::ReleaseSharedReference() Line 227
  [Inline Frame] ShaderCompileWorker-Core.dll!SharedPointerInternals::FSharedReferencer<1>::{dtor}() Line 606
  ShaderCompileWorker-Core.dll!`FTrackedActivity::GetIOActivity'::`2'::`dynamic atexit destructor for 'A''()
  ucrtbase.dll!<lambda>(void)()
  ucrtbase.dll!__crt_seh_guarded_call<int>::operator()<<lambda_7777bce6b2f8c936911f934f8298dc43>,<lambda>(void) &,<lambda_3883c3dff614d5e0c5f61bb1ac94921c>>()
  ucrtbase.dll!_execute_onexit_table()
  ShaderCompileWorker-Core.dll!dllmain_crt_process_detach(const bool is_terminating) Line 182
  ShaderCompileWorker-Core.dll!dllmain_dispatch(HINSTANCE__ * const instance, const unsigned long reason, void * const reserved) Line 293
  ntdll.dll!LdrpCallInitRoutineInternal()
  ntdll.dll!LdrpCallInitRoutine()
  ntdll.dll!LdrShutdownProcess()
  ntdll.dll!RtlExitUserProcess()
  dbssbx64.dll!00007ffb8f224ee8()
  kernel32.dll!ExitProcessImplementation()
  dbssbx64.dll!00007ffb8f224de8()
  dbssbx64.dll!00007ffb8f228397()
  kernel32.dll!BaseThreadInitThunk()
  ntdll.dll!RtlUserThreadStart()

Callstack caught via JIT debugging, no symbols.

  ucrtbase.dll!abort()
  vcruntime140.dll!_purecall() Line 29
  shadercompileworker-core.dll!00007ffb034ed6f0()
  shadercompileworker-core.dll!00007ffb0341c3af()
  shadercompileworker-core.dll!00007ffb0341eefc()
  shadercompileworker-core.dll!00007ffb03518728()
  shadercompileworker-core.dll!00007ffb0351894a()
  shadercompileworker-core.dll!00007ffb03966320()
  ucrtbase.dll!<lambda>(void)()
  ucrtbase.dll!__crt_seh_guarded_call<int>::operator()<<lambda_7777bce6b2f8c936911f934f8298dc43>,<lambda>(void) &,<lambda_3883c3dff614d5e0c5f61bb1ac94921c>>()
  ucrtbase.dll!_execute_onexit_table()
  shadercompileworker-core.dll!00007ffb039495b1()
  shadercompileworker-core.dll!00007ffb039496d2()
  ntdll.dll!LdrpCallInitRoutineInternal()
  ntdll.dll!LdrpCallInitRoutine()
  ntdll.dll!LdrShutdownProcess()
  ntdll.dll!RtlExitUserProcess()
  dbssbx64.dll!00007ffbec214ee8()
  kernel32.dll!ExitProcessImplementation()
  dbssbx64.dll!00007ffbec214de8()
  dbssbx64.dll!00007ffbec218397()
  kernel32.dll!BaseThreadInitThunk()
  ntdll.dll!RtlUserThreadStart()

[Attachment Removed]

I suspect you’re right about the output device not being available during onexit destruction. That said, I really have no idea what this activity tracker is for - SCW certainly doesn’t rely on it for anything.

A quick glance through the code seems to indicate it’s something related to the new console window, which SCW doesn’t use…so I’d be inclined to just disable the activity tracker and see if that solves your crashes. This can be done with a small change to ShaderCompileWorker.Build.cs:

PublicDefinitions.Add("UE_ENABLE_TRACKED_IO=0");I’m going to check that into UE main as I don’t see any reason this should be enabled, seems like unnecessary overhead; hopefully that will resolve your crashes.

[Attachment Removed]

Awesome, thanks. Will try this and report back once we’ve seen whether it fixes the problem or not.

[Attachment Removed]