crash in iris InternalCompareMember when the object we're comparaing against held in the property in the StateBuffer has been dealloacted

Hi Iris Friends!

I’m getting the very rare crash in this stack:

!ReportAssert() [C:\fbroot\Engine\Source\Runtime\Core\Private\Windows\WindowsPlatformCrashContext.cpp:1851]

!FWindowsErrorOutputDevice::Serialize() [C:\fbroot\Engine\Source\Runtime\Core\Private\Windows\WindowsErrorOutputDevice.cpp:84]

!FOutputDevice::LogfImpl() [C:\fbroot\Engine\Source\Runtime\Core\Private\Misc\OutputDevice.cpp:81]

!AssertFailedImplV() [C:\fbroot\Engine\Source\Runtime\Core\Private\Misc\AssertionMacros.cpp:150]

!FDebug::CheckVerifyFailedImpl2() [C:\fbroot\Engine\Source\Runtime\Core\Private\Misc\AssertionMacros.cpp:669]

!UObjectBase::MarkAsReachable() [C:\fbroot\Engine\Source\Runtime\CoreUObject\Private\UObject\GarbageCollection.cpp:6231]

!FObjectProperty::Identical() [C:\fbroot\Engine\Source\Runtime\CoreUObject\Private\UObject\PropertyObject.cpp:126]

!UE::Net::Private::InternalCompareMember() [C:\fbroot\Engine\Source\Runtime\Experimental\Iris\Core\Private\Iris\ReplicationState\InternalPropertyReplicationState.cpp:224]

!UE::Net::FPropertyReplicationState::PollPropertyValue() [C:\fbroot\Engine\Source\Runtime\Experimental\Iris\Core\Private\Iris\ReplicationState\PropertyReplicationState.cpp:173]

!UE::Net::FPropertyReplicationState::PollPropertyReplicationState() [C:\fbroot\Engine\Source\Runtime\Experimental\Iris\Core\Private\Iris\ReplicationState\PropertyReplicationState.cpp:280]

!UE::Net::FReplicationInstanceOperations::PollAndCopyPropertyData() [C:\fbroot\Engine\Source\Runtime\Experimental\Iris\Core\Private\Iris\ReplicationSystem\ReplicationOperations.cpp:683]

!UE::Net::Private::FObjectPoller::PushModelPollObject() [C:\fbroot\Engine\Source\Runtime\Experimental\Iris\Core\Private\Iris\ReplicationSystem\Polling\ObjectPoller.cpp:202]

!UE::Net::Private::FObjectPoller::PollAndCopyObjects() [C:\fbroot\Engine\Source\Runtime\Experimental\Iris\Core\Private\Iris\ReplicationSystem\Polling\ObjectPoller.cpp:46]

!UObjectReplicationBridge::PollAndCopy() [C:\fbroot\Engine\Source\Runtime\Experimental\Iris\Core\Private\Iris\ReplicationSystem\ObjectReplicationBridge.cpp:1353]

!UObjectReplicationBridge::PreSendUpdate() [C:\fbroot\Engine\Source\Runtime\Experimental\Iris\Core\Private\Iris\ReplicationSystem\ObjectReplicationBridge.cpp:1033]

!UReplicationSystem::PreSendUpdate() [C:\fbroot\Engine\Source\Runtime\Experimental\Iris\Core\Private\Iris\ReplicationSystem\ReplicationSystem.cpp:792]

!UNetDriver::TickFlush() [C:\fbroot\Engine\Source\Runtime\Engine\Private\NetDriver.cpp:1024]

it’s happening because the object ptr held in the state buffer has an InternalIndex of -1, indicating it has been deallocated. In this case, I believe there’s nothing else holding onto that previously replicated value, so it seems reasonable for it to be deallocated. Is there some system that’s supposed to keep objects held in replication state buffers from being garbage collected? Or should the code just be more tolerant of those (by storing a weak pointer or something? though that seems inefficient?) If there is a code path that’s supposed to notify the garbage collector about objects in the state buffer so they can be preserved, can you please show me where so I can debug? (Also, I’m on 5.5- maybe this has been fixed already?)

Thanks!

Josh

Steps to Reproduce
no exact repro but I just need some info

Hi,

This issue has been addressed both by the FObjectPtrNetSerializer implementing IsEqual itself and avoiding the FObjectProperty::Identical call and later also the ObjectPtr’s equality operator was modified to avoid marking pointers as garbage. All this will be in 5.7 but you should be able quite easily backport the required changes.

We set this trait in FObjectNetSerializerBase:

static constexpr bool bUseSerializerIsEqual = true;We implement IsEqual in FObjectPtrNetSerializer:

struct FObjectPtrNetSerializer : public Private::FObjectNetSerializerBase<TObjectPtr<UObject>>
{
	static const uint32 Version = 0;
	typedef FObjectPtrNetSerializerConfig ConfigType;
 
	inline static const ConfigType DefaultConfig;
 
	static bool IsEqual(FNetSerializationContext&, const FNetIsEqualArgs&);
};
 
// FObjectPtrNetSerializer implementation
bool FObjectPtrNetSerializer::IsEqual(FNetSerializationContext& Context, const FNetIsEqualArgs& Args)
{
	if (Args.bStateIsQuantized)
	{
		const QuantizedType& Value0 = *reinterpret_cast<const QuantizedType*>(Args.Source0);
		const QuantizedType& Value1 = *reinterpret_cast<const QuantizedType*>(Args.Source1);
 
		return Value0 == Value1;
	}
	else
	{
		const SourceType& Value0 = *reinterpret_cast<const SourceType*>(Args.Source0);
		const SourceType& Value1 = *reinterpret_cast<const SourceType*>(Args.Source1);
 
		// Avoid FObjectPtr operator == which passes by value and may mark objects as reachable. We can compare a formerly valid valid pointer that had its memory deleted or re-used.
		// We don't care about UE_WITH_OBJECT_HANDLE_TYPE_SAFETY being defined or not.
		return Value0.GetHandle() == Value1.GetHandle();
	}
}

Cheers,

Peter

FWIW I’ve realized this only happens when UE::GC::GIsIncrementalReachabilityPending is true, which makes it even harder to repro. That’s necessary for ConditionallyMarkAsReachable to actually try to mutate the object. Otherwise you can happily pass the pointer from the state buffer into the TObjectPtr<> and compare against it and everything works as expected. So very relevant that we’re running the incremental garbage collector

it kinda feels like the right fix is just to not use TObjectPtr in FObjectProperty::Identical. I think you’re using it to make sure the UObject*s are “resolved”, which in this case just means the low bit is not set. So should I fix it by just testing if the low bit is set, and if it’s not, then do a straight compare without TObjectPtr? I think that will give better performance too because I won’t risk a cache miss updating a flag on the object pointer just because I want to compare against the pointer?

thanks very much, Peter! We fixed it locally in ::Identical by making sure not to touch the memory we’re looking up (by the way, implicit casts in ternary statements can make surprising copy constructor calls!). We’ll grab the serializer fix as well, this is awesome.

All the best!

Josh