Race condition with Mass Processors writing to UWorld::SubsystemCollection::SubsystemArrayMap

There seems to be a race condition introduced in 5.6 with multiple Mass processors concurrently writing to UWorld::SubsystemCollection::SubsystemArrayMap.

The call chain is

UMassProcessor::CallExecute ->

FMassExecutionContext::CacheSubsystemRequirements ->

FMassSubsystemAccess::CacheSubsystemRequirements ->

FMassSubsystemAccess::CacheSubsystem ->

FMassSubsystemAccess::FetchSubsystemInstance ->

UWorld::GetSubsystemBase ->

FObjectSubsystemCollection::GetSubsystem ->

FSubsystemCollectionBase::GetSubsystemInternal ->

FSubsystemCollectionBase::FindAndPopulateSubsystemArrayInternal

FSubsystemCollectionBase::GetSubsystemInternal calls Emplace on SubystemArrayMap (which is mutable), when first looking up a subsystem of a given type, and can cause a realloc to a occur.

Between UE 5.5 and 5.6, FMassProcessingPhase::bRunInParallelMode went from defaulting false and getting set true in FMassProcessingPhaseManager::OnPhaseEnd, to defaulting to true instead (and updating in FMassProcessingPhaseManager::OnPhaseStart). So, in 5.5, effectively all Mass processors using the phase processor would run on game thread on their first execution, which prevented this race condition from happening, since they would end up looking up their required subsystems on that first execution. However, with 5.6, the first execution of the Mass processors that may run in parallel do run in parallel (correctly) -- and that first execution can cause a lookup if the subsystem, and concurrent writes to SubystemArrayMap, and interleaved reallocs, which can lead to memory corruption and crashes (which we’ve seen with our upgrade to 5.6).

As a workaround, we’re locally making this change to FMassProcessingPhaseManager::OnPhaseStart, just before setting `GraphBuildState.bInitialized = true;` which ensures that the first lookup of these subsystems (and hence the Emplace call to SubystemArrayMap ) still occurs on the game thread.

                        // cache subsystem access requirements here, while single-threaded during initialization, 
			// so that any sid eeffect write access to World::SubsystemCollection's FSubsystemCollectionBase::SubsystemArrayMap is safe
			TConstArrayView<UMassProcessor*> ChildProcessors = PhaseProcessor->GetChildProcessorsView();
			for (UMassProcessor* ChildProcessor : ChildProcessors)
			{
				FMassSubsystemAccess SubsystemAccess;
				SubsystemAccess.CacheSubsystemRequirements(ChildProcessor->GetProcessorRequirements());
    			};

This solution seems a bit fragile, though, since it seems like there is a deeper problem with the mutable SubystemArrayMap allowing const access to UWorld from concurrent threads to ultimately result in concurrent write access.

Thanks.

The workaround above only solved the issue for processors directly registered with the phase processor, and this was still an issue for other processors, such as signal processors. Instead of the above change, the workaround I’m now using is the following code, added to UMassProcessor::CallInitialize, just before the `bInitialized = true` line. It must be after the ConfigureQueries call, so that the subsystem requirements are added first.

                // cache subsystm access requirements here, while single-threaded during initialization,
		// so that any side effect write access to World::SubsystemCollection's FSubsystemCollectionBase::SubsystemArrayMap is safe
		FMassSubsystemAccess SubsystemAccess;
		SubsystemAccess.CacheSubsystemRequirements(GetProcessorRequirements());

This is not something we were aware of and certainly needs addressing. Thank you for bringing this to our attention! The team is currently discussing how we want to tackle this which will probably include some MT access detectors around the various data structures for subsystems in Mass.

Have you noticed any particular setup that causes this to trigger more often? A certain number of processors with subsystem requirements? Total number of subsystems required across all processors?

Thanks for looking into it. Like most race conditions it’s hard to get a reliable repro, but we do have a very large project making extensive use of mass processors and subsystems, with well over 100 processors, and many subsystems, several of which are required by some of the processors. I imagine likelihood of hitting it would scale up with those numbers, but we had been hitting it very inconsistently.

Concurrency seems to increase debugging time as much as it speeds up runtime. I always ask if any magic has been found to help us isolate and replicate the issue.

We have some open bug issues right now we have been trying to solve in Mass, and your post about going parallel by default is seeming like it may have caused some of these other issues. So thank you for letting us know your findings! It has likely saved us many headaches. The team is looking into this right now. I unfortunately do not have an ETA for the fix, but it is actively being handled!

Just a quick update that I’m now using this version of the workaround in UMassProcessor::CallInitialize. We were still rarely seeing crashes, due to processors in UMassEntityEditorSubsystem initializing without a world and therefore not adding to the SubsystemArrayMap until the later execute call.

		// cache subsystem access requirements here, while single-threaded during initialization, 
		// so that any side effect write access to World::SubsystemCollection's FSubsystemCollectionBase::SubsystemArrayMap is safe
		UWorld* World = EntityManager->GetWorld();
		if (World != nullptr) 
		{
			FMassSubsystemAccess SubsystemAccess(World);
			SubsystemAccess.CacheSubsystemRequirements(GetProcessorRequirements());
		}
		else
		{
			// we couldn't cache subsystem access here; need to force game thread execution since otherwise we'll have unsafe
			// concurrent access to UWorld::SubsystemCollection::SubsystemArrayMap
			bRequiresGameThreadExecution = true;
		}

Thank you for the update! The team is actively investigating and discussing this issue today. Some MTAccessDetectors were added so we can track this down in our own code base. One item that was brought up was that we would need to re-cache subsystem requirements at runtime whenever a new processor is registered with the phases. Our thought is the original behavior obscured the issue, and that the true fix lies in UWorld rather than Mass. I will share out your updated change with the team as well.

Posting a reply to re-open this question.

Have you found a direction to take on this? I think it makes sense that a UWorld change probably seems like the more appropriate correct solution to this; either some change to the lazy-init caching approach to the subsystem array map, or at least some way to make it threadsafe.

We have a CL currently in review. I believe it should be submitted early this week. The CL is moving subsystem caching from FMassExecutionContext to FMassEntityManager. Additionally, it will populate the subsystem cache on processor initialization instead of on execute. I will try to follow this change to let you know when it is submitted in a public p4 stream.

-James

Just checking in on this -- has the CL for this been submitted, and is it available on a public stream? Thanks.

The team was all offsite last week for a team summit. There has not been any movement on this yet. I apologize for the delay this has added. I’ll send a reminder for our dev who has it shelved.

-James