Super bad network lag in online game

Hello all!

I’m working on an online shooter, and I recently tested it with a friend in America (I live in Ireland). In our testing though we found extremely bad lag that cause the player to teleport around the map, their shots are ignored, pawns are standing around frozen, it’s a completely unplayable mess. A read of the logs revealed some interesting information.

There are a few interesting warnings that filled the log file during gameplay


[2016.03.09-22.57.14:211][479]LogNetTraffic:Warning:       Received unreliable bunch before open (Channel 30 Current Sequence 0)
// This following warning was most of the log file
[2016.03.09-22.57.14:211][479]LogSecurity:Warning: 76561198043423694:7777: Invalid_Data: Received a packet with an Acked PacketID
[2016.03.09-22.57.14:208][479]LogNetTraffic:Warning: High single frame packet loss: 43

This is evidently extremely bad. I ran the game locally and induced lag and packet loss with the defaultEngine.ini settings. What I found was interesting. The above symptoms only occur when lag alone is set to 200 with a variance of 50. I would not have thought that as lag so bad to be causing such issues. When playing like this the experience is unplayable for less than a minute, and then the player is kicked from the game.

I tried to run the network profiler and found the UI is broken and I couldn’t properly analyse the data. It looks like this:

From the limited data I got from the netprofiler, movement is taking up the majority of time and bandwidth of the network. And that’s code I didn’t write it’s part of the engine. This leaves me frustrated and in despair and my inability to do anything about this.

Do more experienced network programmers have some insight as to why this is so bad? And more importantly what I can do to fix it? Surely UT has encountered issues like this, what did you do Epic?

Thanks!

Hi Constan7ine,

The security warning is okay, it’s telling you there’s been a duplicate RPC-packet. It’s being catched to prevent replay-attacks.

I can’t reproduce the problems you’re describing with the settings above. Not even with 2000 ping. Shame the Network Profiler is broken, it’s a nice tool.

Why would there be duplicate RPC-packets when lag is so high, surely packets just arrive very late, and not in doubles?

Also may I ask in what situation did you test this with 2000ms latency? If I can get good results with a template project or demo I could compare differences and find a problem.

I thought the log was from your play session with your friend. It does show up in my simulated session though, I guess because the packet is resent when it hasn’t been ACKed yet because of the delay. This results in two or more packets arriving at the destination.

I tested this in ShooterGame.

Hmm, so does anyone know from experience how I might go about rectifying this? Will reducing network usage and slimming down usage of replication and RPC’s improve the lag? Will it improve the engine’s behaviour of high lag situations.

After some quick testing my games does definitely use more network bandwidth than Shootergame, so there is room for optimization, somehow. Would it fix this?

try narrowing down what gets replicated to only whats truly needed. this helped in a project i worked on that horrid network usage because everything in it was replicated and the packet sizes were massive

Can you show screenshots of the Actors, Properties and RPC’s tabs or upload the .nprof file?

Also which engine version are you using? 4.10+ fixed a few problems with the Network profiler

Sorry for the massive delay in replying. I just recently started slimming down and optimizing the back sizes and replicated variables. It’s not yet tested in a real environment, but I’m able to get about 40% reduction in total data sent (note, this is from napkin math).

Here’s the nprof file with no optimization done on it: https://dl.dropboxusercontent.com/u/53792213/BeforeOptimisationServerProfile.nprof

And yes, I’m currently using 4.10, but a source distribution. If that matters.

Currently I’m quite nervous about testing the optimizations, I’m unsure how reducing the network usage will reduce lag, and more important prevent the insane repercussions I observed with such lag. If anyone has insights I’d love them.

So I’ve done a bunch of optimization on replication properties and RPC’s. I’ve drastically reduced the size of properties and RPCs, and in some cases completely removed properties, favoring other solutions. If anyone’s interested I’ll do a post detailing the things I did to reduce network usage. Here’s a new profiling file as well. I’m quite proud of one RPC that used to require 396 bits of data. It’s been reduced to 73 bits.

The good news is that the game behaves better than it did. You can now play at 200ms lag. It’s still got problems but it’s an improvement. It starts to get unplayable again at about 500ms of lag. But until then it’s playable and improved.

I have another problem that I’d like advice on though. Starting at around about 300ms of lag. And getting worse as the lag increases, clients are often just disconnected for no apparent reason. At 300ms the game is playable, and at 500 it’s quite laggy, but just barely playable. But it doesn’t matter if the engine just kicks the client.

Why is this happening and how can I prevent it?

For reference. Here’s a log of a server during the kick.


[2016.03.21-14.33.34:456][676]LogNet:Warning: UNetConnection::TrackLogsPerSecond instant FAILED. LogsPerSecond: 71.996994, RemoteAddr: 192.168.0.21
[2016.03.21-14.33.34:456][676]LogNet: UNetConnection::Close: Name: IpConnection_0, Driver: GameNetDriver IpNetDriver_0, PC: BoxmanPlayerControllerKing_C_1, Owner: BoxmanPlayerControllerKing_C_1, Channels: 127, RemoteAddr: 192.168.0.21:50549, Time: 2016.03.21-14.33.34
[2016.03.21-14.33.34:456][676]LogNet: UChannel::Close: Sending CloseBunch. ChIndex == 0. Name: ControlChannel_0
[2016.03.21-14.33.34:456][676]LogRep:Error: Rejected too many unwanted functions ServerMoveOld in BBCharacterMovementComponent /Game/BoxmanBegins/DataDrive/Maps/Ambush.Ambush:PersistentLevel.BP_Trooper_C_4.CharMoveComp
[2016.03.21-14.33.34:456][676]LogNet:Error: UActorChannel::ProcessBunch: Replicator.ReceivedBunch failed.  Closing connection. RepObj: BBCharacterMovementComponent /Game/BoxmanBegins/DataDrive/Maps/Ambush.Ambush:PersistentLevel.BP_Trooper_C_4.CharMoveComp, Channel: 109
[2016.03.21-14.33.34:477][677]LogNet: NotifyAcceptingConnection: Server Ambush accept
[2016.03.21-14.33.34:477][677]LogNet: Open Ambush 03/21/16 14:33:34 192.168.0.21
[2016.03.21-14.33.34:477][677]LogNet: Added client connection.  Remote address = 192.168.0.21:50549
[2016.03.21-14.33.34:478][677]LogNetTraffic:Warning: High single frame packet loss: 1139
[2016.03.21-14.33.34:478][677]LogSecurity:Warning: 192.168.0.21:50549: Malformed_Packet: UNetConnection::ReceivedPacket: Received control channel close before open
[2016.03.21-14.33.34:478][677]LogSecurity:Warning: 192.168.0.21:50549: Closed: Connection closed
[2016.03.21-14.33.34:478][677]LogNet: UNetConnection::Close: Name: IpConnection_1, Driver: GameNetDriver IpNetDriver_0, PC: NULL, Owner: NULL, Channels: 2, RemoteAddr: 192.168.0.21:50549, Time: 2016.03.21-14.33.34
[2016.03.21-14.33.34:478][677]LogNet: UChannel::Close: Sending CloseBunch. ChIndex == 0. Name: ControlChannel_1
[2016.03.21-14.33.34:490][678]LogNet: NotifyAcceptingConnection: Server Ambush accept
[2016.03.21-14.33.34:490][678]LogNet: Open Ambush 03/21/16 14:33:34 192.168.0.21
[2016.03.21-14.33.34:490][678]LogNet: Added client connection.  Remote address = 192.168.0.21:50549
[2016.03.21-14.33.34:490][678]LogNetTraffic:Warning: High single frame packet loss: 1140
[2016.03.21-14.33.35:040][711]LogNet: UNetConnection::Cleanup: Closing open connection. Name: IpConnection_2, RemoteAddr: 192.168.0.21:50549 Driver: GameNetDriver, PC: NoPC, Owner: No Owner
[2016.03.21-14.33.35:040][711]LogNet: UNetConnection::Close: Name: IpConnection_2, Driver: GameNetDriver IpNetDriver_0, PC: NULL, Owner: NULL, Channels: 1, RemoteAddr: 192.168.0.21:50549, Time: 2016.03.21-14.33.35

This is the log from the client


[2016.03.21-14.45.20:327][902]LogNetTraffic:Warning: High single frame packet loss: 18
[2016.03.21-14.45.20:327][902]LogNet: UChannel::ReceivedSequencedBunch: Bunch.bClose == true. ChIndex == 0. Calling ConditionalCleanUp.
[2016.03.21-14.45.20:327][902]LogNet: UChannel::CleanUp: [GameNetDriver] [BoxmanPlayerControllerKing_C_0] [BoxmanPlayerControllerKing_C_0]. ChIndex == 0. Closing connection.
[2016.03.21-14.45.20:327][902]LogNet: UNetConnection::Close: Name: IpConnection_0, Driver: GameNetDriver IpNetDriver_0, PC: BoxmanPlayerControllerKing_C_0, Owner: BoxmanPlayerControllerKing_C_0, Channels: 73, RemoteAddr: 192.168.0.21:7777, Time: 2016.03.21-14.45.20
[2016.03.21-14.45.20:327][902]LogNet: UChannel::Close: Sending CloseBunch. ChIndex == 0. Name: ControlChannel_0
[2016.03.21-14.45.20:327][902]LogSecurity:Warning: 192.168.0.21:7777: Invalid_Data: Received a packet with an Acked PacketID
[2016.03.21-14.45.20:327][902]LogNet:Warning: Network Failure: GameNetDriver[ConnectionLost]: Your connection to the host has been lost.
[2016.03.21-14.45.20:327][902]LogNet: NetworkFailure: ConnectionLost, Error: 'Your connection to the host has been lost.'

After dealing with some other issues my performance updates were tested in live multiplayer with someone in American, and someone in europe. The results were disappointing to say the least, and I’d like to ask for more help.

Generally gameplay was improved noticeably by the reduction in network usage from my optimization. However there were still two prevalent issues that made the game largely unplayable. One of which is still the game kicking players every so often.

When your lag is about 150-200, the game just kicks you, for no apparent reason. Why would the engine do this. The logs are the same as is shown above and don’t give much useful information.

Secondly, despite movement being smoother on clients, every few seconds the game teleports the player to some other location on the map. One that makes no sense with regards to the momentum of the player before the teleportation. It is almost as though the server is trying to correct the players movements, but it’s getting it extremely long, and so clients just end up teleporting all over the map, every few seconds, again, making the game completely unplayable.

Thoughts?

You are either sending too large data through the network or sending too many single values too often.
Or maybe both.

You should orgizine your logic to send one single (not too big) struct with th relevant value updates, never send individual properties; even worse if you send individual values every frame (tick), that would be absurd thing to do.
Plan your game around that main replicated Struct and keep it as small as possible.
Whenever possible, have everything not related to the main Struct, controlled/executed by Events.
Exceptions are things that you can’t really control, such as character position replication and player input.
Let the Engine replicate only what it needs replicated and everything else try to build Event based as much as possible.

And the dc, server will disconnect player if a function can’t be validated for whatever reason.
Networking is no joy, you’ll have to be very patient and get used to rework things constantly.

Thank you very much for this advice. I will revisit it. As I mentioned above the network profile doesn’t work, and it’s been difficult to get useful information on it. Is there some way I can get access to it’s source code? Or get a fixed version?

With regards to the disconnect, I assume I need to check all validated RPC’s and find one with a failing Validate call?

I checked all the RPC’s and every single one returns true in the validate call. Am I interpreting you correctly as to the source of this “validation”, or is there some other way a function could become invalid?

The high packet loss is the main issue; it invalidates your data and connection is closed.
That’s why is important to form a nice pack of data (struct) and send it at once, causes much less overhead than sending individual properties. This is not just for UE games btw;
I think UE4 packs properties that it replicates by default, but in your game logic you may be sending too many single values from lots of classes and making the net driver going nuts.

Btw, can you log what the actual value of UPlayer::CurrentNetSpeed is…