Crash in UNiagaraComponentPool::Cleanup(UWorld *)

Hello,

We are experiencing a weird crash when quickly exiting the game from our main menu after exiting a level. If we wait for a few seconds in the main menu before quitting, no crash. Timing matters here and also we absolutely have to load a full level before exiting to be able to repro the crash.

void UNiagaraComponentPool::Cleanup(UWorld* World)

{

for (auto& Pool : WorldParticleSystemPools)

{

// it crashes here because Pool.Key is null

FNiagaraCrashReporterScope CRScope(Pool.Key);//In practice this may be null by now :frowning:

Pool.Value.Cleanup();

}

WorldParticleSystemPools.Empty();

Even if we null check Pool.Key in the range-based for loop, then WorldParticleSystemPools.Empty(); will still fail so it’s unclear to me how to work around this one. I tried to go through p4 history to understand where the //In practice this may be null by now :frowning: comment was coming from but I’m lacking the context to understand how it can happen.

Is it possible that the UNiagaraComponentPool::Cleanup is still going on (or some GC pass) and then the GameEngine::PreExit/UWorld::CleanupWorld/UNiagaraComponentPool::Cleanup would get into a race condition?

Running with -stompmalloc didn’t show anything wrong.

I will run again with FX.NiagaraComponentPool.Validation=1 hoping it can give me a hint but I’m looking for ideas to figure this one out.

Thank you

Francis

Steps to Reproduce
1 - Go in game, load a level.

2 - Go back to the main menu.

3 - Quickly exit the app => See the crash report.

Hey Francis, do you have a stacktrace of the crash? What is the caller of UNiagaraComponentPool::Cleanup() when the crash happens?

There shouldn’t be any race condition, as these methods all run on the game thread (as far as I know). So I would say it’s either a problem of memory corruption (unlikely if stompmalloc was fine) or a garbage collection issue.

For debug purposes, you could try and add something like

TStrongObjectPtr<UNiagaraSystem> SystemReference;

to the FNCPool struct and initialize with the correct system whenever a new element is added to the WorldParticleSystemPools map. If that fixed the crash, then it’s definitely a garbage collection issue. TStrongObjectPtr keeps a uobject from being garbage collected without going through the uproperty-chain.

You could also try and set the cvar “gc.CollectGarbageEveryFrame 1” to see if that changes anything.

Regards,

Michael

My mistake, here we go.

[ 00 ] RaiseException ( KERNELBASE.dll )

[ 01 ] ReportAssert(wchar_t const *,void *) ( WindowsPlatformCrashContext.cpp:1848 )

[ 02 ] FWindowsErrorOutputDevice::Serialize(wchar_t const *,ELogVerbosity::Type,FName const &) ( WindowsErrorOutputDevice.cpp:84 )

[ 03 ] FOutputDevice::LogfImpl(wchar_t const *,…) ( OutputDevice.cpp:81 )

[ 04 ] UE::Logging::Private::BasicFatalLog(FLogCategoryBase const &,UE::Logging::Private::FStaticBasicLogRecord const *,…) ( StructuredLog.cpp:1106 )

[ 05 ] FMallocBinned2::CanaryFail(FMallocBinned2::FFreeBlock const *) ( MallocBinned2.cpp:918 )

[ 06 ] FMallocBinned2::CanaryTest(FMallocBinned2::FFreeBlock const *) ( MallocBinned2.h:543 )

[ 07 ] FMallocBinned2::FreeExternal(void *) ( MallocBinned2.cpp:793 )

[ 08 ] FMallocBinned2::ReallocExternal(void *,unsigned __int64,unsigned int) ( MallocBinned2.cpp:677 )

[ 09 ] FMallocBinned2::ReallocInline(void *,unsigned __int64,unsigned int) ( MallocBinned2.h:432 )

[ 10 ] FMallocBinned2::Realloc(void *,unsigned __int64,unsigned int) ( MallocBinned2.h:366 )

[ 11 ] TSizedHeapAllocator<32,FMemory>::ForAnyElementType::ResizeAllocation(int,int,unsigned __int64,unsigned int) ( ContainerAllocationPolicies.h:746 )

[ 12 ] TArray<FCrashStackFrame,TSizedDefaultAllocator<32> >::AllocatorResizeAllocation(int,int) ( Array.h:3095 )

[ 13 ] TArray<FCrashStackFrame,TSizedDefaultAllocator<32> >::ResizeTo(int) ( Array.h:3174 )

[ 14 ] TArray<TSparseArrayElementOrFreeListLink<TAlignedBytes<32,8> >,TSizedDefaultAllocator<32> >::Empty(int) ( Array.h:1976 )

[ 15 ] TSparseArray<TSetElement<TTuple<FName,FString> >,TSparseArrayAllocator<TSizedDefaultAllocator<32>,FDefaultBitArrayAllocator> >::Empty(int) ( SparseArray.h:409 )

[ 16 ] TSet<TTuple<TObjectPtr<UNiagaraSystem>,FNCPool>,TDefaultMapHashableKeyFuncs<TObjectPtr<UNiagaraSystem>,FNCPool,0>,FDefaultSetAllocator>::Empty(int) ( Set.h:462 )

[ 17 ] TMapBase<TObjectPtr<UNiagaraSystem>,FNCPool,FDefaultSetAllocator,TDefaultMapHashableKeyFuncs<TObjectPtr<UNiagaraSystem>,FNCPool,0> >::Empty(int) ( Map.h:244 )

[ 18 ] UNiagaraComponentPool::Cleanup(UWorld *) ( NiagaraComponentPool.cpp:256 )

[ 19 ] FNiagaraWorldManager::OnWorldBeginTearDown() ( NiagaraWorldManager.cpp:732 )

[ 20 ] FNiagaraWorldManager::OnWorldBeginTearDown(UWorld *) ( NiagaraWorldManager.cpp:972 )

[ 21 ] Invoke((TArray<FString,TSizedDefaultAllocator<32> > const &) const &,TArray<FString,TSizedDefaultAllocator<32> > const &) ( Invoke.h:47 )

[ 22 ] UE::Core::Private::Tuple::TTupleBase<TIntegerSequence<unsigned int> >::ApplyAfter((TArray<FString,TSizedDefaultAllocator<32> > const &) const &,TArray<FString,TSizedDefaultAllocator<32> > const &) ( Tuple.h:317 )

[ 23 ] TBaseStaticDelegateInstance<void (TArray<FString,TSizedDefaultAllocator<32> > const &),FDefaultDelegateUserPolicy>::ExecuteIfSafe(TArray<FString,TSizedDefaultAllocator<32> > const &) ( DelegateInstancesImpl.h:779 )

[ 24 ] TMulticastDelegateBase<FDefaultTSDelegateUserPolicy>::Broadcast(FConfigFile const *) ( MulticastDelegateBase.h:257 )

[ 25 ] TMulticastDelegate<void (FConfigFile const *),FDefaultTSDelegateUserPolicy>::Broadcast(FConfigFile const *) ( DelegateSignatureImpl.inl:1079 )

[ 26 ] UGameEngine::PreExit() ( GameEngine.cpp:1246 )

[ 27 ] FEngineLoop::Exit() ( LaunchEngineLoop.cpp:5067 )

[ 28 ] EngineExit() ( Launch.cpp:80 )

[ 29 ] GuardedMain::__l2::EngineLoopCleanupGuard::{dtor}() ( Launch.cpp:130 )

[ 30 ] GuardedMain(wchar_t const *) ( Launch.cpp:202 )

[ 31 ] GuardedMainWrapper(wchar_t const *) ( LaunchWindows.cpp:123 )

[ 32 ] LaunchWindowsStartup(HINSTANCE__ *,HINSTANCE__ *,char *,int,wchar_t const *) ( LaunchWindows.cpp:277 )

[ 33 ] WinMain(HINSTANCE__ *,HINSTANCE__ *,char *,int) ( LaunchWindows.cpp:318 )

[ 34 ] invoke_main() ( exe_common.inl:102 )

[ 35 ] __scrt_common_main_seh() ( exe_common.inl:288 )

[ 36 ] BaseThreadInitThunk ( kernel32.dll )

[ 37 ] RtlUserThreadStart ( ntdll.dll )

Hey Francis, did you try the workaround I posted above, changing the SystemReference field into a TStrongObjectPtr?

I had to put this aside for a while working on a release but now I can push that into the main dev branch and see if it’s stable over a certain period of time.

Lately, it’s been harder to reproduce but still happens and also started to happen not only when exiting the app but also when exiting from a match back to the main menu. Same callstack except the UGameEngine::PreExit() part.

After adding a

TStrongObjectPtr<UNiagaraSystem> SystemReference;

We still get the crash, with a slightly different callstack.

[ 00 ] TConstSetBitIterator<FDefaultBitArrayAllocator>::FindFirstSetBit() ( BitArray.h:1921 )

[ 01 ] TConstSetBitIterator<FDefaultBitArrayAllocator>::{ctor}(TBitArray<FDefaultBitArrayAllocator> const &) ( BitArray.h:1845 )

[ 02 ] TSparseArray<TSetElement<TTuple<TObjectPtr<UNiagaraSystem>,FNCPool> >,TSparseArrayAllocator<TSizedDefaultAllocator<32>,FDefaultBitArrayAllocator> >::begin() ( SparseArray.h:1083 )

[ 03 ] TSet<TTuple<TObjectPtr<UNiagaraSystem>,FNCPool>,TDefaultMapHashableKeyFuncs<TObjectPtr<UNiagaraSystem>,FNCPool,0>,FDefaultSetAllocator>::begin() ( Set.h:1814 )

[ 04 ] TMapBase<TObjectPtr<UNiagaraSystem>,FNCPool,FDefaultSetAllocator,TDefaultMapHashableKeyFuncs<TObjectPtr<UNiagaraSystem>,FNCPool,0> >::begin() ( Map.h:1065 )

[ 05 ] UNiagaraComponentPool::Cleanup(UWorld *) ( NiagaraComponentPool.cpp:251 )

[ 06 ] FNiagaraWorldManager::OnWorldBeginTearDown() ( NiagaraWorldManager.cpp:732 )

[ 07 ] FNiagaraWorldManager::OnWorldBeginTearDown(UWorld *) ( NiagaraWorldManager.cpp:972 )

[ 08 ] Invoke((TArray<FString,TSizedDefaultAllocator<32> > const &) const &,TArray<FString,TSizedDefaultAllocator<32> > const &) ( Invoke.h:47 )

[ 09 ] UE::Core::Private::Tuple::TTupleBase<TIntegerSequence<unsigned int> >::ApplyAfter((TArray<FString,TSizedDefaultAllocator<32> > const &) const &,TArray<FString,TSizedDefaultAllocator<32> > const &) ( Tuple.h:317 )

[ 10 ] TBaseStaticDelegateInstance<void (TArray<FString,TSizedDefaultAllocator<32> > const &),FDefaultDelegateUserPolicy>::ExecuteIfSafe(TArray<FString,TSizedDefaultAllocator<32> > const &) ( DelegateInstancesImpl.h:779 )

[ 11 ] TMulticastDelegateBase<FDefaultTSDelegateUserPolicy>::Broadcast(FConfigFile const *) ( MulticastDelegateBase.h:257 )

[ 12 ] TMulticastDelegate<void (FConfigFile const *),FDefaultTSDelegateUserPolicy>::Broadcast(FConfigFile const *) ( DelegateSignatureImpl.inl:1079 )

[ 13 ] UGameEngine::PreExit() ( GameEngine.cpp:1246 )

[ 14 ] FEngineLoop::Exit() ( LaunchEngineLoop.cpp:5067 )

[ 15 ] EngineExit() ( Launch.cpp:80 )

[…]

Failing in the begin sounds like the data was already freed?

I’ll add a check to try to detect if the Cleanup is being called twice.

That is weird, almost as if the memory of the map itself is corrupted.

What does the level you use for your repro steps look like, does it spawn pooled effects when you load into it?

The memory of the map is corrupted. I looked into all changes on perforce that can be related to race condition or memory problems thinking they could solve what I believe is memory stomping but without success.

The idea that Cleanup is called twice is also proved to be wrong, I did put an atomic boolean to check that possibility and never triggered anything.

At this point, I’ll track a structure with vectors in parallel of the TMap and whenever there’s a change I’ll go over the structure to validate it. I will also add canary head and tail guards to try detect memory stomping.

I do also have an assumption it might be tied to another problem we’ve had : [Content removed]

As I couldn’t narrow it down, I did put some checks when the RHICmdList.Create*** returns a null buffer in FNiagaraRibbonGpuBuffer::Allocate but we know that the issue is still there even if it’s not crashing. I suspect there’s still memory stomping, maybe it’s the same root cause, maybe not.

And finally, about our level. We do have some static environment VFx and some VFx that are tied to characters that are spawned on level load. I’ll remove them.

I still did not manage to reproduce the issue on my end, but if you get it to crash in a debugger next time, can you check if your UNiagaraComponentPool object is still valid? If you look at its object flags you should see if it was marked as garbage, which would explain why accessing the map on it is causing a crash.

I also pushed a speculative fix you could try and integrate (CL 48298390, Github commit 89d3a3fe66c571baee3dc9385275c3c983001b91).

There were also two recent fixes to the world manager which might be related and fix your crash:

CL 46600645, Github commit 13f4a043c56ba65fe151692c17dc21543d68cdb9

CL 46929543, Github commit af3732edfc77dd557dc075f82a05b7c4bfc53256