Blueprint references to level Actors are lost when streaming out and back in the level

Answers.Archive · September 11, 2018, 8:54pm

Hello, sorry for late response. I’ve made a workaround, but now we probably have the problem with the same nature.

And I don’t think I have access to your vcs, could you upload a patch withyour fix?
Thanks.

PS We are not using Compile Manager in our version, that’s the reason why it still actual for me.

Answers.Archive · September 11, 2018, 8:54pm

Attached a diff generated from the shelved change. This was made against the latest code in Dev-Networking.

Answers.Archive · September 11, 2018, 8:54pm

Thanks for the patcth. It works.
But after applying the patch we’ve faced very strange crash

callstack lines can be a little bit different, so I will show a code after a callstack:

CallStack - OTWD!FDebug::AssertFailed() [c:\jenkins\workspace\otwd-win\ue4\engine\source\runtime\core\private\misc\assertionmacros.cpp:414]
OTWD!UActorChannel::CleanupReplicators() [c:\jenkins\workspace\otwd-win\ue4\engine\source\runtime\engine\private\datachannel.cpp:1567]
OTWD!UNetDriver::Shutdown() [c:\jenkins\workspace\otwd-win\ue4\engine\source\runtime\engine\private\networkdriver.cpp:1167]
OTWD!DestroyNamedNetDriver_Local() [c:\jenkins\workspace\otwd-win\ue4\engine\source\runtime\engine\private\unrealengine.cpp:9399]
OTWD!UEngine::ShutdownWorldNetDriver() [c:\jenkins\workspace\otwd-win\ue4\engine\source\runtime\engine\private\unrealengine.cpp:9231]
OTWD!UEngine::LoadMap() [c:\jenkins\workspace\otwd-win\ue4\engine\source\runtime\engine\private\unrealengine.cpp:10352]
OTWD!UEngine::Browse() [c:\jenkins\workspace\otwd-win\ue4\engine\source\runtime\engine\private\unrealengine.cpp:9971]
OTWD!UEngine::TickWorldTravel() [c:\jenkins\workspace\otwd-win\ue4\engine\source\runtime\engine\private\unrealengine.cpp:10189]
OTWD!UGameEngine::Tick() [c:\jenkins\workspace\otwd-win\ue4\engine\source\runtime\engine\private\gameengine.cpp:1211]
OTWD!FEngineLoop::Tick() [c:\jenkins\workspace\otwd-win\ue4\engine\source\runtime\launch\private\launchengineloop.cpp:3301]
OTWD!GuardedMain() [c:\jenkins\workspace\otwd-win\ue4\engine\source\runtime\launch\private\launch.cpp:166]
OTWD!GuardedMainWrapper() [c:\jenkins\workspace\otwd-win\ue4\engine\source\runtime\launch\private\windows\launchwindows.cpp:134]
OTWD!WinMain() [c:\jenkins\workspace\otwd-win\ue4\engine\source\runtime\launch\private\windows\launchwindows.cpp:210]
OTWD!__scrt_common_main_seh() [f:\dd\vctools\crt\vcstartup\src\startup\exe_common.inl:253]
kernel32
ntdll

Crash callstack leads to line with for iterator:

oid UActorChannel::CleanupReplicators( const bool bKeepReplicators )
{
	// Cleanup or save replicators
	for ( auto CompIt = ReplicationMap.CreateIterator(); CompIt; ++CompIt )
	{
             if ( bKeepReplicators && CompIt.Value()->GetObject() != nullptr )

Crash reason is out of bounds array index

Is it something with iterator implementation in this case? I can’t imagine how for line can crash with empty arrays. Or can it be just access to the invalid object, and the line is wrong just because of optimization level? This bug is very hard to reproduce, and we don’t have resources to catch it on the level without optimization.

Top level code causing the crash is the code from the patch:

		for (auto It = ServerConnection->ActorChannels.CreateIterator(); It; ++It)
		{
			UActorChannel* Channel = It.Value();
			if (Channel)
			{
				Channel->CleanupReplicators();
			}
		}

Maybe I should add some additional checks somewhere?

Answers.Archive · September 11, 2018, 8:54pm

Thanks a lot.

PS I am on my way in restoring perforce access to avoid this problems in the future.

Answers.Archive · September 11, 2018, 8:54pm

Hello,

Thanks for the fix!
One issue. It’s not enough to:

RepLayoutMap.Remove(Level->LevelScriptActor->GetClass());

You should also remove all replicated functions/events:

for (auto Func : TFieldRange(LevelScriptActor->GetClass(), EFieldIteratorFlags::ExcludeSuper))
{
	if (Func && Func->HasAnyFunctionFlags(EFunctionFlags::FUNC_Net))
	{
		RepLayoutMap.Remove(Func);
	}
}

LMK if it makes sense.
Cheers,
M

Answers.Archive · September 11, 2018, 8:54pm

That does indeed make sense, thanks for catching it! For reference, I’ve entered a bug to get this fixed in a future engine release, UE-60086.

Answers.Archive · September 11, 2018, 8:54pm

Hi Dmitry,

Apologies for the delay. I have not seen this particular crash before while testing the fix, although the test case examples originally used were very simple.

I suspect that you are correct and that code optimization is giving you a misleading crash line; it is more likely that the array assertion is being hit from within one of the cleanup calls. Either that, or the actor channel array is being modified unexpectedly in the midst of the iteration. If you are able to find consistent repro steps that would be helpful for tracking this down, and in the meantime I’ll take a look again in a 4.20 build.

Are you able to share the engine version you’re currently working with? I would also like to look for interim networking fixes that might explain what’s going on.

Thanks,
Brian

Answers.Archive · September 11, 2018, 8:54pm

Hello, Brian, thank you for helping us.

This type of issues hard to reproduce. I was unable to do it on my machine, it happens sporadically while our QA finishing the level, so for now there is no 100% repro. I am still looking for a repro now. And as soon as I find something I will post it here, but I can’t guarante a success. I will also analyze their reports and add some more info.

We are using 4.18 now. In our modifications we have very small amount of networking related changes. And we are not touching nothing related to replicators and channels (and as soon as I removing the patch from the version the crash goes away). Anyway, I think as it’s a good idea to provide as much information as I can, so I will show you the only suspicious change (but it doesn’t look related) we’ve made in the engine:

void UNetDriver::InternalProcessRemoteFunction ( line 1295 )
	(
	...
	// Get the actor channel.
	UActorChannel* Ch = Connection->ActorChannels.FindRef(Actor);
	if( !Ch )
	{
		if( IsServer )
		{
			//SBZ
			if ( Actor->IsPendingKillPending() || Actor->bTearOff ) (line 1359)
			{
				// Don't try opening a channel for me, I am in the process of being destroyed. Ignore my RPCs.
				return;
			}
			// SBZ

we don’t create a new channel for tearred of actors, if someone tries to send rpc call, that’s it. Don’t think this changes lead to the crash.

Can we add some workaround for the Replicator issue now? some critical section or if branches? the only things removes actors from ActorChannels are:

void UActorChannel::Close()
void UActorChannel::SetClosingFlag()
void UNetDriver::NotifyActorLevelUnloaded( AActor* TheActor )

I think I should log this too, and hope engine will be able to flush this data before the crash.

Answers.Archive · September 11, 2018, 8:54pm

Hi Dmitry,

I have yet not been able to reproduce a crash with the changes. If it really is the iterator/actor channel map that is crashing, you could try switching to iterating on the open channels list instead (should generally be safer in case this is a garbage collection problem):

for (UChannel* Channel : ServerConnection->OpenChannels)
{
	UActorChannel* ActorChannel = Cast(Channel);
	if (ActorChannel)
	{
		ActorChannel->CleanupReplicators();
	}
}

It may be worth adding logging to the replicator cleanup loop with the contents of UChannel::Describe() to help track down what is or is not being cleaned up before the crash, but it would be a lot of log spam in the normal working scenario.

Answers.Archive · September 11, 2018, 8:54pm

Thanks.

Two days of testing is not a long time for this crash, bu it seems it is fixed.

If I receive new information I will come back to this thread.
For now I would like to say that this solution clearly fixes the problem.
Great job.

NMouzourakis · December 12, 2018, 8:26pm

Hello, we seem to be seeing a similar issue happening on our end, is there any update on issue UE-60086? Or perhaps a CL that has gone into a recent engine version (or just a current best fix) that we can integrate into our engine? Our version is 4.17.