Reliable RPCs get dropped on Replication Graph if Actor gets set dormant shortly before the reliable RPC gets called

Ok, so after doing some internal investigation, I think this is caused by the following lines of code and how replication graph associates actor channels with actors. Normally without the replication graph, the net driver will immediately disassociate an actor channel from an actor upon closing the actor channel for whatever reason via SetClosingFlag():

void RemoveActorChannel(AActor* Actor)
{
    ActorChannels.Remove(Actor); // <-- Actor & channel pairing removed / disassociated
    if (ReplicationConnectionDriver)
    {
       ReplicationConnectionDriver->NotifyActorChannelRemoved(Actor);
    }
}

However, with a replication graph, there is an additional association inside the NetReplicationGraphConnection, which keeps a map of actors and channels. There seems to be a comment about waiting for the client to ack the channel closing before removing the association:

void UNetReplicationGraphConnection::NotifyActorChannelRemoved(AActor* Actor)
{
    // No need to do anything here. This is called when an actor channel is closed, but
    // we're still waiting for the close bunch to be acked. Until then, we can't safely replicate
    // the actor from this channel. See NotifyActorChannelCleanedUp.
}

The association would finally get cleaned up in `NotifyActorChannelCleanedUp`. However, during this period between server closing and receiving the client ack, all reliable RPCs will get dropped. This is because the channel still exists and is in the closing state:

for (UNetReplicationGraphConnection* Manager : Connections)
{
    FConnectionReplicationActorInfo& ConnectionActorInfo = Manager->ActorInfoMap.FindOrAdd(Actor);
    UNetConnection* NetConnection = Manager->NetConnection;

    if (NetConnection->IsClosingOrClosed())
    {
       return true;
    }

    // This connection isn't ready yet
    if (NetConnection->ViewTarget == nullptr)
    {
       continue;
    }

    // Streaming level actor that the client doesn't have loaded. Do not send.
    if (ActorStreamingLevelName != NAME_None && NetConnection->ClientVisibleLevelNames.Contains(ActorStreamingLevelName) == false)
    {
       continue;
    }

    // While the channel is closing we cannot send new multicast RPCs
    if (ConnectionActorInfo.Channel && ConnectionActorInfo.Channel->Closing)
    {
       continue;
    }

Therefore, the RPC has no opportunity to open a new actor channel to send the RPC. This is totally valid for actors that are falling out of relevancy, as the client will destroy these actors anyway before receiving the RPC. However, for net dormancy, I assume we want the actor to still receive reliable RPCs, as having the rpc either get received or not essentially based on the client’s time to ack to the server will create very unreliable behavior.

Our current workaround has been to just create our own custom RepliationGraphConnection class and immediately disassociate on whether the actor is dormant or not:

void UModifiedNetReplicationGraphConnection::NotifyActorChannelRemoved(AActor* Actor)
{
    // Default function does not remove the channel from the actor info map until the channel
    // is ready to be cleaned up.  Unfortunately, this causes an issue where if a reliable RPC is called
    // before the server has received the ack from the client about the closed actor channel, the rpc will
    // get dropped, which is not what we want for dormant actors.  So instead, I remove this from the actor
    // info map immediately so that the repliation graph believes it can open a new actor channel for the actor.
    Super::NotifyActorChannelRemoved(Actor);

    FConnectionReplicationActorInfo* InfoMapItem = ActorInfoMap.Find(Actor);
    if (InfoMapItem && InfoMapItem->bDormantOnConnection)
    {
       UActorChannel* Channel = InfoMapItem->Channel;
       PendingCloseFromDormancyActorChannels.AddUnique(Channel);
       InfoMapItem->ResetFrameCounters();
       
       ActorInfoMap.RemoveChannel(Channel);
    }
}

This feels extremely hacky and possibly buggy, but has allowed our dormant actors to receive reliable rpcs reliably.

[Attachment Removed]

Steps to Reproduce
Creating a project with a basic replication graph implementation, we created an actor that is set to DormantAll and AlwaysRelevant. Each frame, we update its replication using ForceNetUpdate (this was a bug we were tackling, but helped expose this issue). Then, periodically we would send a reliable multicast RPC from this actor, and observe that some of the rpc calls would never make it to some clients, despite the actor still being relevant and the rpc being reliable.

[Attachment Removed]

Alright, I officially reproduced this in a TP_FirstPerson project on a fresh UE 5.7 install. I failed to mention it does not reproduce unless network emulation is enabled. I think if you always receive the client ack asap then there’s no chance for the rpc to get dropped. The worse the ping, the more the rpcs get dropped. The actor I set up is pretty basic, just make sure it is set to Replicate, is Always Relevant, and NetDormancy to Dormant All. Then, set your replication driver in the DefaultEngine.ini to the BasicReplicationGraph.

The actor blueprint is just as follows. I’m using a material with a random color to observe from the client whether the server multicast makes it to the client. The important part is the force net update on tick, which should maximize the possibility the replication graph is waiting on a client ack before removing the channel. Once again, I know this isn’t good if you’re doing a force net update every frame, but I could see valid issues when you have 2 gameplay systems, one that forces a net update, and one that calls a reliable rpc, and they happen to happen very close to each other, so this is just for demonstration purposes. [Image Removed] [Image Removed]

[Attachment Removed]

Hi,

Thank you for bringing this to our attention and for the detailed repro steps!

I’ve been able to reproduce this as well, and I’ve opened a new issue, UE-378758, which should be visible in the public tracker in a day or so.

In the meanwhile, if you run into any issues with your workaround, please don’t hesitate to reach out.

Thanks,

Alex

[Attachment Removed]