Sometimes Pawn Possession does not complete on Client

Hello, I’m having a weird issue that may be Iris related (doesn’t seem to happen with Iris disabled). Sometimes during network lag or even during a normal connection the Client possession will not go all the way through. The server will be possessed to the pawn but the Client will not.

RetryClientRestart doesn’t seem to work at all and the Client Player Controller is stuck in a state with no pawn and no HUD and can’t do anything at all. There seems to be no way to recover from it.

Our game makes use of a lot of Possessing and Unpossessing as you can “Meld” into any pawn in the area at a fairly quick rate. However, sometimes ClientRestart() will get a null pawn and once that happens it can’t recover as it seems to refuse to get a replicated pawn passed to it.

I tried to manually send/request the pawn from the Server however the Client seems to always have a null pawn even when I pass a valid one to it so it can run ClientRestart(). Here is my attempted workaround.

This seems like an issue where the Pawn simply doesn’t exist on the Client side and there is no open channel from the Server to the Client for that particular pawn. I’m running a debug now to try and get more info, but i’ve been trying to figure this out for a few days so I figured i’d post sooner rather than later. Thank you!

`void ARingPlayerController::SERVER_RetryClientRestart_Implementation()
{
ReplicateImmediately(GetPawn()); // Tries to force it to replicate and be relevant for clients

UE_LOG(LogPlayerController, Verbose, TEXT(“SERVER_RetryClientRestart_Implementation %s”), *GetNameSafe(GetPawn()));

ClientRestart(GetPawn());
CLIENT_PawnMelded(GetPawn());
}

void APlayerController::ClientRestart_Implementation(APawn* NewPawn)
{
UE_LOG(LogPlayerController, Verbose, TEXT(“ClientRestart_Implementation %s”), *GetNameSafe(NewPawn));
// NewPawn always ends up bieng null
…`

Steps to Reproduce

Update: I switched it up and used SetPawn() instead when asking the server to RetryClientRestart then when that decides to replicate I call ClientRestart(), that seems to have fixed my issue. However, I think the actual issue is that sometimes the Iris Implementation AddDependentActor may not reliably update the Pawn on the Client side, but that’s my guess. This doesn’t happen when not using Iris.

Update 2: nevermind it can still happen just less frequent I suppose.

`void ARingPlayerController::SERVER_RetryClientRestart_Implementation()
{
ReplicateImmediately(GetPawn());

UE_LOG(LogPlayerController, Verbose, TEXT(“SERVER_RetryClientRestart_Implementation %s”), *GetNameSafe(GetPawn()));
SetPawn(GetPawn());
}

void ARingPlayerController::OnRep_Pawn()
{

if (bFailedInitialRestart)
{
if (GetPawn())
{
bFailedInitialRestart = false;
ClientRestart(GetPawn());
CLIENT_PawnMelded(GetPawn());
}
}
}`

Hi,

I don’t believe we’ve run into this issue internally, so I have a few questions to get a better idea of the problem.

First, is the issue that ClientRestart is only ever called with a null value for NewPawn, or is the “Pawn” property on the controller also only null? Are AController::OnRep_Pawn and APawn::OnRep_Controller not eventually being called with the expected values?

Next, could you provide some more info on these pawn actors and how they’re set up? Are these actors placed in the map (possibly as part of a World Partition streaming cell), or are they dynamically spawned?

Do you have Controller.AlwaysNotifyClientOnControllerChange enabled? This should be true by default, but I wanted to double check. If you are using any other non-default settings on your controller/pawn, that would be helpful to know as well.

Finally, your workaround seems reasonable, but is there anything unexpected happening when SetPawn is called the first time on the server (from APlayerController::OnPossess)? I wouldn’t expect this would need to be called again on the server.

Thanks,

Alex

Hello, Hope you had a good break!

While we do have some parallel functions called since we have this thing called “Melding” OnMeld / OnUnMeld its just to have more control over initialization we don’t actually change any of the Possession and Unpossession flow, its all just standard.

So from my tests when this happens OnRep_Pawn is still called however Pawn is nullptr, which leaves the Client Controller in the limbo state. OnRep_Controller does not get called when in this state. This started happening when we upgraded from 5.4 to 5.5.

Controller.AlwaysNotifyClientOnControllerChange is default to enabled we haven’t edited that CVAR. It’s very strange. My easiest repro is in editor when we have many blueprints open while testing.

I’m not totally sure if it’s an issue of relevancy where the pawn is just not spawned for the client in the time the server tells the client to possess the pawn… though I know RetryClientRestart is supposed to prevent such cases, it just doesn’t seem to be working properly.

Even when I tell the client to ask the server for a Pawn, when the RPC runs the passed in pawn (from the server) is still null. I’m not sure if the server is having trouble opening an actor channel for the client or if its something else. There aren’t any error logs when this happens unfortunately.

So I think a combination of net.ResetAckStatePostSeamlessTravel = 1 and net.AllowClientRemapCacheObject=1 as well as doing some network optimizations. ( We spawn a lot of actors up front amortized per frame in order to pool them. ) We noticed though those actors were not initially dormant so they were flooding the network unnecessarily; and cleaned up some RPCs and so far we haven’t encountered the issue.

Even though for the time being this seems resolved im attaching the logs you requested with the Repro and Non Repro for comparison. Since I feel this may crop up during intense network conditions.

LogNetPackageMap,LogNetTraffic and LogNet are included in these. Really appreciate the help on this strange issue!

Hi,

Thanks!

I’m still not sure what could be causing this issue, and I’m unable to reproduce something similar in my own test project. If you’re able to provide a basic sample project reproducing the issue, that would be appreciated, but in the meanwhile, I have some more questions.

Are you doing anything for these pawn actors to affect their relevancy or dormancy? Are they using any sort of filter other than the default spatial filter?

If the pawn is never received and spawned on the client, then it does make sense that the ClientRestart handling wouldn’t work. Like you said, retrying ClientRestart is done to handle the case where the controller receives this RPC before the pawn has been created on the client, but in this case, there seems to be something preventing the pawn from being replicated.

You mentioned it tends to happen with a lot of blueprints open in the editor. Are you able to reproduce the issue outside of the editor?

For debugging this further, it may help to increase the verbosity of your logs (e.g. LogIris, LogIrisBridge, LogIrisRpc, and LogIrisFiltering). Even though there aren’t any errors/warning, it could help to get more detailed info on how your pawn is being handled by the replication system.

Thanks,

Alex

Yeah let me see if I can get an example project, with similar settings to our game. This does also happen in a packaged environment (development), generally when there are a bunch of people connected someone possessing a new pawn may be put in the limbo state.

To note we are using the Havok Branch of UE5.5.4 though I don’t think that would affect pawn possession. Worst case I’ll see if I can get a packaged version of our game with some debug symbols.

Here are our Iris settings, most of these settings are copied from Lyra some from the Iris getting started document, no custom filtering at least I assume the Lyra settings were default.

`[/Script/Engine.Player]
ConfiguredInternetSpeed=500000
ConfiguredLanSpeed=500000

[/Script/OnlineSubsystemUtils.IpNetDriver]
MaxClientRate=100000
MaxInternetClientRate=100000

[/Script/Engine.GameNetworkManager]
TotalNetBandwidth=4000000
MaxDynamicBandwidth=100000
MinDynamicBandwidth=40000

[/Script/Engine.Engine]
+NetDriverDefinitions=(DefName=“GameNetDriver”,DriverClassName=“OnlineSubsystemSteam.SteamNetDriver”,DriverClassNameFallback=“OnlineSubsystemUtils.IpNetDriver”)
!IrisNetDriverConfigs=ClearArray
+IrisNetDriverConfigs=(NetDriverDefinition=“GameNetDriver”,bEnableIris=true)

Voice]
bEnabled=true

[/Script/IrisCore.ObjectReplicationBridgeConfig]
; Filters
DefaultSpatialFilterName=Spatial
; Clear all filters
!FilterConfigs=ClearArray
+FilterConfigs=(ClassName=/Script/Engine.LevelScriptActor, DynamicFilterName=NotRouted) ; Not needed
+FilterConfigs=(ClassName=/Script/Engine.Actor, DynamicFilterName=None))

; Info types aren’t supposed to have physical representation
+FilterConfigs=(ClassName=/Script/Engine.Info, DynamicFilterName=None)
+FilterConfigs=(ClassName=/Script/Engine.PlayerState, DynamicFilterName=None)
; Pawns can be spatially filtered
+FilterConfigs=(ClassName=/Script/Engine.Pawn, DynamicFilterName=Spatial))
+FilterConfigs=(ClassName=/Script/EntityActor.SimObject, DynamicFilterName=None))

[/Script/Engine.NetDriver]
; All Iris replication is handled by various DataStream implementations that are ticked via the DataStreamManager instance in this channel.
+ChannelDefinitions=(ChannelName=DataStream, ClassName=/Script/Engine.DataStreamChannel, StaticChannelIndex=2, bTickOnCreate=true, bServerOpen=true, bClientOpen=true, bInitialServer=true, bInitialClient=true)

[SystemSettings]
; Required for Iris:
net.SubObjects.DefaultUseSubObjectReplicationList=1
net.IsPushModelEnabled=1
net.Iris.UseIrisReplication=1
net.Iris.ReplicationWriterMaxAllowedPacketsIfNotHugeObject = 5;

[/Script/IrisCore.ReplicationStateDescriptorConfig]
+SupportsStructNetSerializerList=(StructName=RingGameplayAbilityTargetData_Projectile)
+SupportsStructNetSerializerList=(StructName=RingGameplayAbilityTargetData_SingleTargetHit)
+SupportsStructNetSerializerList=(StructName=RingAbilityTargetData_Items)`

I’ll get back to you soon with more logging and potentially a project that replicates the issue. Thank you!

So we got some more information via LogNetPackageMap logging and it seems like the Spawned Pawn is not getting mapped correctly so it doesn’t get replicated to the Client correctly. I included a snippet attached.

Line 9 and on is the Client anything before that is the Server spawning and serializing the actor, and trying to map the components to the pawn which all fail because its unmapped.

I tried LogIris,LogIrisBridge,LogIrisFiltering,LogIrisRpc at Log,Verbose and VeryVerbose but they weren’t seem to be logging anything.

I’m having trouble replicating it in a default 5.5.4 project but I’ll keep at finding a repo. Hopefully this info sheds more light. Thank you.

Ourissue seems somewhat related to this here [Content removed]

according to the log NOT_IN_CACHE is not supposed to be possible to hit. PackageMapClient.cpp Line 4117

I also enabled Verbose Logging and it seems like the BP_SoldierCharacter_C_0 (the dynamically spawned one) is never registered in the Cache for some reason.

Hi,

This is very unexpected, and I’m not sure why the dynamically spawned actor isn’t being registered in this case. In the past, most of the issues like this we’ve seen have occurred after a seamless travel. Is that the case for your repro here? If so, you may want to try enabling net.ResetAckStatePostSeamlessTravel.

Also just to clarify, are the logs here from a repro with or without Iris enabled? Given that it occurs regardless of which system is being used, I don’t think it matters, but I just want to make sure.

If you could provide some more logs with LogNetPackageMap, LogNetTraffic, and LogNet set to VeryVerbose, that would also be helpful. Just to double check, you’re not seeing any other log lines referencing BP_RingSoldierCharacter_C_0 or NetGUID 84?

Finally, are you able to reproduce the issue with net.AllowClientRemapCacheObject enabled?

Thank you for all the information and for your patience as we continue diagnosing the issue.

Thanks,

Alex

Hi,

Great, I’m glad you were able to resolve the issue!

Based on the logs here, one possible cause for the actor not being replicated could be bandwidth saturation. It also looks like the open bunch for the pawn is fairly large and gets split into multiple partial bunches. If one of these gets dropped, they all need to be resent, so with bad network conditions, packet loss and saturation can prevent an actor like this from being received on the client.

On top of optimizing your initial replication as you’ve done, something else to be aware of is the net.PartialBunchReliableThreshold CVar. If a bunch is split into a number of bunches greater than or equal to this CVar’s value, each of those partial bunches is treated as reliable. This allows the engine to resend the individual partial bunches if one is dropped, rather than resending all of them, and it can greatly improve robustness when replicating large actors. If you do experience more issues, it may be worth trying to set this lower to see if it helps (current default is 8).

Thanks,

Alex

We’ll give that a shot, thank you!