Iris Replication with bNetLoadOnClient

Hi,

We are in the process of converting our project to use Iris, but are having problems when bNetLoadOnClient is set.

It would appear that when this is set, and the client spawns within relevancy range, the actor does not replicate. It will replicate correctly under legacy replication, and will replicate correctly when spawning outside relevancy or when bNetLoadOnClient is turned off.

This would imply that Iris is not linking correctly linking the actor correctly when it is loaded in with the level. My assumption would have been it would behave itself if leaving replication range and returning (on the assumption the deletion and recreation of the object via replication would fix it), but surprisingly this also did not work.

Are we using this incorrectly? Or is Iris not supposed to used with bNetLoadOnClient?

A video has been uploaded here that shows the problem in a clean project: Video Unavailable

In order, it shows:

  1. Working with legacy replication with NetLoadOnClient turned on.
  2. Working with legacy replication with NetLoadOnClient turned off.
  3. Working with Iris replication with NetLoadOnClient turned off.
  4. Not working Iris replication with NetLoadOnClient turned on when spawning inside of relevancy range.
  5. Working Iris replication with NetLoadOnClient turned on when spawning outside of relevancy range.

Thanks,

Steve

[Attachment Removed]

I should also note that when using a listen server (but not when running dedicated and client) from PIE, I get the following error for every actor with bNetLoadOnClient set:

LogIris: Error: RepBridge(0)::Client cannot replicate object

This implies a timing issue to me, like the level is loaded at a bad time for Iris trying to set up the bridge or something.

[Attachment Removed]

Hi,

I’m not aware of any known issues with Iris and bNetLoadOnClient, and I was not able to reproduce this issue in my own test project.

Are you able to provide a minimal project reproducing the issue?

If this isn’t possible, could you provide some more details on how these actors are set up, such as any non-default flags being set on them or what filter and/or prioritizer they’re configured to use?

Thanks,

Alex

[Attachment Removed]

Hey Alex,

Thanks for the quick response. I’ve realised now I’ve not been clear in my explanation of our issue, there’s a subtly I missed.

It would seem to work correctly when an actor has a replicated variable.

The case that does not work correctly is when an actor has a replicated UObject (with a replicated fragment set) that is created on server using NewObject in BeginPlay, set using MarkReplicatedSubobject, then marked dirty.

It seems to be a timing issue. It’s more likely to fail with multiple actors in the level, and more likely to fail when running under separate processes. Perhaps this is a result of creating the UObjects before the client loads or something like that?

Perhaps this is abuse of NetLoadOnClient - that this assumes there wouldn’t be dynamically added or removed replicated subobjects.

I’ll try and get you a simple example project if it doesn’t repro for you.

Cheers,

Steve

[Attachment Removed]

Hi Alex,

Apologies for the late reply, something came up, but we’ve done some additional testing.

What appears to be happening is a race condition between Iris and World when calling BeginPlay on clients, so this problem only appears when BeginPlay does some work that is necessary for the actor to behave correctly. In a simple, empty test level with 20 replicating cubes, and a log on BeginPlay in the actor blueprint, I was seeing ~12 of them hit BeginPlay correctly.

Inside AActor::DispatchBeginPlay there is a flag bActorIsPendingPostNetInit, which will prevent BeginPlay from being dispatched. This flag appears to be set true by Iris, and then set to false again by Iris in UNetActorFactory::PostInit.

The problem occurs when bNetLoadOnClient is true because PostInit is not called. As such, timing matters:

  1. If the world calls DispatchBeginPlay before Iris has set bActorIsPendingPostNetInit on that actor, then BeginPlay is called.
  2. If Iris sets bActorIsPendingPostNetInit before the world calls DispatchBeginPlay, then the call will fail due to the flag.

The reason this appears to work fine when bNetLoadOnClient is false is because the dynamically created (rather than level placed) flow from Iris calls PostNetInit, which itself calls DispatchBeginPlay.

We have only been able to replicate this issue when RunUnderOneProcess is turned off.

Hopefully I’ve explained this okay, it took me a little bit to get my head round what was happening when it was explained to me by someone on my team. :wink:

I’ve uploaded a very simple scenario where I can see this happen - I think you should just be able to stick these files into a blank uproject and compile it (just enabling Iris and associated logging), but LMK if that’s not how you normally receive minimal examples! I should note that replication works in this example, but you should see BeginPlay called on all actors for the server but only some of them on the client.

Thanks,

Steve

[Attachment Removed]

Hey Alex,

Thought I’d mention this as it’s related and I can reproduce it immediately in that test case I provided earlier.

If you run this as a listen server and have a client connect to it, every actor that has NetLoadAsClient (and WorldSettings) will produce this error on the server:

Error LogIris RepBridge(0)::Client cannot replicate object: RootObject BP_ReplicationTest_C_12 (InternalIndex: 15) (NetRefHandle (Id=35):(RepSystemId=0)). Reported by client: ConnectionId:1 ViewTarget: DefaultPawn_1 Named: [UNetConnection] RemoteAddr: 127.0.0.1:57619, Name: IpConnection_2, Driver: Name:GameNetDriver Def:GameNetDriver IpNetDriver_2, IsServer: YES, PC: PlayerController_1, Owner: PlayerController_1, UniqueId: NULL:Steve-PC-359E51124BA4D0C6FA7C95B388C3D425

Looking at the client logs, the errors seems to boil down to ‘could not find static actor’, which sounds like another ordering issue - like Iris is trying to initialise before the world is ready.

LogIris: Error: UNetActorFactory::InstantiateNetObjectFromHeader Failed to resolve ObjectReference: [NetRefHandle (Id=9):(RepSystemId=?)](/Game/UEDPIE_0_TestLevel).[NetRefHandle (Id=7):(RepSystemId=?)](TestLevel).[NetRefHandle (Id=5):(RepSystemId=?)](PersistentLevel).[NetRefHandle (Id=17):(RepSystemId=?)](BP_ReplicationTest_C_3) . Could not find static actor.
 
LogIris: Warning: RepBridge(0)::CreateNetRefHandleFromRemote: Failed to instantiate RootObject NetHandle: NetRefHandle (Id=17):(RepSystemId=?) using header:
        FStaticActorNetCreationHeader (ProtocolId:0xb5ed5a26):
        ObjectReference=NetRefHandle (Id=17):(RepSystemId=?)
        CustomCreationData=0 bits
 
LogIris: Error: FReplicationReader::ReadObject Unable to create handle for NetRefHandle (Id=17):(RepSystemId=?).
 
]LogIris: Error: FReplicationReader::ReadObject Failed to read replicated object with Handle: NetRefHandle (Id=17):(RepSystemId=?). Error 'Broken NetHandle'. ErrorContext:
0: - BitOffset: 42:ReadObjectBatch
NetObject None (InternalIndex: None) (NetRefHandle (Id=19):(RepSystemId=?))
1: - BitOffset: 78:ReadCreationInfo
NetObject None (InternalIndex: None) (NetRefHandle (Id=0):(RepSystemId=?))
2: - BitOffset: 928:ReadObjectBatch
NetObject None (InternalIndex: None) (NetRefHandle (Id=33):(RepSystemId=?))
3: - BitOffset: 964:ReadCreationInfo
NetObject None (InternalIndex: None) (NetRefHandle (Id=0):(RepSystemId=?))
4: - BitOffset: 1318:ReadObjectBatch
NetObject None (InternalIndex: None) (NetRefHandle (Id=45):(RepSystemId=?))
5: - BitOffset: 1354:ReadCreationInfo
NetObject None (InternalIndex: None) (NetRefHandle (Id=0):(RepSystemId=?))
6: - BitOffset: 1708:ReadObjectBatch
NetObject None (InternalIndex: None) (NetRefHandle (Id=31):(RepSystemId=?))
7: - BitOffset: 1744:ReadCreationInfo[2026.02.04-12.11.41:062][378]LogIris: Error: FReplicationReader::ReadObject Failed to read object batch handle: NetRefHandle (Id=17):(RepSystemId=?) skipping batch data

Steve

[Attachment Removed]

Hi,

Thank you for the additional information, but unfortunately, I’m still not able to reproduce this locally. If you could provide a repro project, that would be greatly appreciated.

Thanks,

Alex

[Attachment Removed]

Hi,

Thank you for the additional information and the repro!

I believe you’re running into this known issue: https://issues.unrealengine.com/issue/UE-357736

This was originally found when testing with low-priority actors, but it is good to know that this timing issue can occur with “Run Under One Process” disabled. I’ve added the info you’ve provided here to our internal tracker for the bug.

In the meanwhile, you may be able to work around the issue by adding a check to !IsNetStartupActor() in AActor::DispatchBeginPlay:

void AActor::DispatchBeginPlay(bool bFromLevelStreaming)
{
	// If we are spawned from networking, the actor is not ready for begin play until the initial state has been applied.
	if (bActorIsPendingPostNetInit && !IsNetStartupActor())
	{
		if (UE::Net::FReplicationSystemUtil::GetReplicationSystem(this))
		{
			return;
		}
	}
...

This will ensure that actors marked as bNetLoadOnClient won’t have their call to BeginPlay skipped if DispatchBeginPlay is called after bActorIsPendingPostNetInit is set to true.

Testing this change in the repro project did seem to fix the issue, but I haven’t been able to do more thorough testing on it. If you do try out this fix, please let me know if you run into any problems.

Thanks,

Alex

[Attachment Removed]

Your proposed fix did indeed fix the test case I provided. Replication still doesn’t work correctly with bNetLoadOnClient in the actual project though, so I need to do some more digging…

Cheers

Steve

[Attachment Removed]

Hi,

Thank you for bringing this to our attention! After digging into this some more, I believe I’ve found what may be the cause of this issue.

When we create the object reference handle, the path name should get renamed depending on whether the instance is/isn’t a PIE instance (FObjectReferenceCache::CreateObjectReferenceInternal -> FObjectReferenceCache::RenamePathForPie). However, this isn’t remapping the path as expected in the case where the listen server is a PIE instance and the client is a standalone editor instance.

The first issue is that, on the standalone client, FObjectReferenceCache::RenamePathForPie will skip calling UEngineReplicationBridge::RemapPathForPIE, since it doesn’t have a play in editor ID. This means that any paths received from the server won’t be remapped.

The second issue is that on the server, these object references are created before the client has connected, so UEngineReplicationBridge::RemapPathForPIE will call UEditorEngine::NetworkRemapPath with a nullptr for the UNetConnection, which causes UEditorEngine::NetworkRemapPath to skip calling NetworkRemapPath_Local. This is normally what would handle stripping the PIE prefix from the path, so any object reference cache entries created on the server before the client connects will still have the PIE prefix.

In my own testing, I observed that the level path specifically would fail to have its PIE prefix stripped on the standalone editor client, causing issues with actors placed in the level.

I’ve opened a new issue for this, UE-363862, which should be visible in the public tracker in a day or so.

In the meanwhile, it’s hard to say what the best way to fix this may be. I believe you could modify UEngineReplicationBridge::RemapPathForPIE so that in the case where ConnectionId == UE::Net::InvalidConnectionId and bReading==false, it just calls UWorld::RemovePIEPrefix rather than NetworkRemapPath. This would make sure the PIE prefix is stripped from any paths the server sends, but I’m not sure what side effects it could have.

That being said, I believe you should be able to run the client/server instances either all in single process PIE or all as separate, standalone editor processes in order to work around the issue.

Thanks,

Alex

[Attachment Removed]

Thanks for looking into this. I tried your speculative fix, but it didn’t seem to work for me (the entries coming into RemapPathForPIE looked like they had already been stripped).

Certainly running in a single process for PIE doesn’t have this problem, which in most cases is likely to be an acceptable workaround for us as this is only an issue during testing.

As an aside, am I correct in thinking NetLoadOnClient isn’t used for actors loaded via World Partition? In which case it could be safely turned off in most of our use cases, as very little is in the peristent world.

[Attachment Removed]

Hi,

This can be turned off to have these actors essentially treated as though they were dynamically spawned on the server, but it is still highly recommended for projects to set bNetLoadOnClient=true for any replicated actors placed in the map.

Thanks,

Alex

[Attachment Removed]