Check if TearOff() has completed Iris replication

Hello. We have a pooling system for replicated Projectiles:

  • When we are done with an instance and put it back into the pool on the server, we TearOff() so that any replicated client instances get destroyed.
  • When we re-spawn an instance we set bTearOff to true and call BeginReplication().

This seems to work fine.

However, after using Iris we notice that there is a race, where an instance gets respawned before the TearOff() has completed. This is especially noticeable with MulticastRPCs, where we run into the ensure() FReplicationWriter::MarkObjectDirty() as the State is WaitOnDestroyConfirmation.

Questions:

  • Is setting bTearOff to true and calling BeginReplication() the right way to restart replication?
  • How do we check that the TearOff() is complete on the server and that it is safe to respawn a given instance? Ideally we don’t have to iterate over all connections as we have a lot of instances and connections…

Steps to Reproduce

It seems that we stop getting a valid RefHandle once the TearOff() is complete, so this seems to work?

#if UE_WITH_IRIS
		if (const UWorld* MyWorld = Actor->GetWorld())
		{
			if (FWorldContext* const Context = GEngine->GetWorldContextFromWorld(MyWorld))
			{
				for (FNamedNetDriver& Driver : Context->ActiveNetDrivers)
				{
					if (Driver.NetDriver != nullptr && Driver.NetDriver->ShouldReplicateActor(Actor))
					{
						const UReplicationSystem* ReplicationSystem = Driver.NetDriver->GetReplicationSystem();
						if (const UObjectReplicationBridge* Bridge = ReplicationSystem->GetReplicationBridgeAs<UObjectReplicationBridge>())
						{
							// If the actor has a valid ref handle, it means it is still replicated and/or waiting for the TearOff() to complete.
							if (Bridge->GetReplicatedRefHandle(Actor).IsValid())
							{
								return false;
							}
						}
					}
				}
			}
		}
#endif

Hi,

To get a better idea of the problem, I have a couple questions.

First, you mention that you’re calling TearOff “so that any replicated client instances get destroyed,” but I don’t believe torn off actors should get automatically destroyed on the client (in fact, there’s an open issue for this happening in certain situations). Are you manually destroying these actors on the client after they are torn off, or are you using a custom NetDriver class that overrides ShouldClientDestroyTearOffActors?

Next, when you start replicating the actor again, you said you’re setting bTearOff to true before calling BeginReplication. Just to double check, did you mean that bTearOff is set back to false when restarting replication?

Thanks,

Alex

Thanks for your quick reply.

  • We are manually calling Destroy() on those replicated actors in TornOff(). We might want to do some custom logic on them in the future.
  • TearOff() sets bTearOff to true and dirties it. We weren’t sure if BeginReplication() automatically sets it back to false, so we call bTearOff = false right before BeginReplication(). I will check if BeginReplication() already sets it to false!

The issue we were running into is that Projectile actors (whose replicated copies had been torn off) are re-used before their tear off was complete. We have a `bool HasPendingReplication(AActor& Instance)` check that disqualifies a pooled actor from reuse, but that logic only works for traditional non-Iris replication and we are running into the ensure() in FReplicationWriter::MarkObjectDirty().

Hence we are interested in what an appropriate check for Iris would be, and if there are other traps with our approach.

It looks like bTearOff is not returned to true on the owning server by BeginReplication and stays true. We also no longer get fresh replicated actors on the clients if we BeginReplication() without reverting bTearOff to false first.

[Image Removed]

Hi,

Thank you for the additional information! After discussing this with the team, I believe we’d recommend a different approach to this sort of pooling.

It is important to note that TearOff is not intended to be reversible. While setting bTearOff back to false and calling BeginReplication is working for your project, this is not a supported operation, and you may run into unexpected behavior and issues when doing do.

Iris does include functionality for starting and stopping replication for an object, but there is not any way for the server to know when the object is no longer “active” for a client after it has stopped.

That being said, it’s hard to recommend an alternate approach without more context as to how/why this pooling system is being used. Could you provide some more info on your project’s actor pooling? Is the intention to avoid the cost of spawn operations only on the server?

Thanks,

Alex

The main intent of the pooling system is indeed to avoid spawning cost on the server. Even after reducing the weight of the Actor, using a pooled version is 3x faster than spawning a fresh one, which makes a significant difference for our use case.

We are aware that server-side pooling doesn’t immediately help with spawning cost on the replicating clients, but the server has to (re)spawn every projectile in the world, while the clients only have to spawn projectiles within their replication range. So overall server-only pooling is still a significant win for us. Given our number of projectile(type)s and players, we want to make sure pooled projectiles don’t incur any replication cost or take up replication handles.

We have an early prototype for a client-side cache that would allow us to reuse a cached Actor when we resume replication or a client comes back into replication range.

Hi,

Thanks for the additional context.

Given how the pooling is only server-side, one approach that could make sense is to use a custom dynamic filter for these actors. When the projectile is no longer active, it could be filtered out on the clients, and you’ll likely need some sort of cooldown period between when an actor becomes inactive and when it is reused. Because we haven’t tried anything like this before, it is difficult to provide more specific advice or guidance.

Also, related to client-side pooling: one option worth exploring there is creating a custom NetObjectFactory (likely derived from UNetActorFactory) for these actors, as this would give you greater control over how the clients handle instantiating and destroying these actors.

Thanks,

Alex