Error: FBitWriter overflowed! (and Client Disconnected)

After some period of play (sometimes 30 minutes, sometimes 3 hours), the server will stop populating its log file and the clients will be disconnected with the following error:

[2020.03.31-11.22.32:013][679]LogNetSerialization: Error: FBitWriter overflowed! (WriteLen: -1, Remaining: 8110, Max: 8110)
[2020.03.31-11.22.32:015][679]LogNet: Warning: Closing connection. Can't send function 'ServerAim' on 'KPWeaponController /Game/Developer/Maps/BigTree/BigTree.BigTree:PersistentLevel.KPTruck_678.WeaponController': Reliable buffer overflow. FieldCache->FieldNetIndex: 7 Max 11. Ch MaxPacket: 1024.

I’ve been running some soak tests with bots on a dedicated (linux) server. My game is a vehicle physics game, and I was previously encountering a lot “Ensure Condition failed” messages, resulting in strange behavior (like universal overlaps due to -inf/inf transform values) and occasional crashes. In an effort to fix some of these issues, I’ve made the following changes:

  1. Cap physics body linear and angular velocities for vehicles and projectiles
  2. Enable Enhanced Determinism
  3. Reduce max depenetration velocity

When a vehicle or projectile exceeds its speed limit its velocity is immediately reduced to the max. If the transform contains a NaN, it’s reverted to the last known good value. If it happens more than twice consecutively, the object is destroyed. For what it’s worth, I haven’t seen a NaN in over 20 hours of soak tests since I enabled Enhanced Determinism.

I’ve been looking at the call site where SetOverflowed(-1) is called, and it’s in DataBunch.cpp:

	// Reserve channel and set bunch info.
	if( Channel->NumOutRec >= RELIABLE_BUFFER-1+bClose )
	{
		SetOverflowed(-1);
		return;
	}

I’m not yet what would cause the packets to pile up such that NumOutRec is exceeding that buffer size, and I haven’t found a way to debug it so I can get some insight. I’m also wondering if somehow enhanced determinism could contribute to this, or substepping, or some other physics setting, or if I’m just now exposing a deeper issue since resolving a more shallow one.

The network payloads are not optimized yet, but at the moment clients send about 5-6kb/s to the server and receive 8-20kb/s (depending on how populated the server is, of course).

In any case, any help would be appreciated. Thank you!

This error never occured again for me by modifying UNetConnection::InitSendBuffer() to look like this:

void UNetConnection::InitSendBuffer()
{
	check(MaxPacket > 0);

	int32 FinalBufferSize = (MaxPacket * 8) - MaxPacketHandlerBits;

	// Initialize the one outgoing buffer.
	if (FinalBufferSize == SendBuffer.GetMaxBits())
	{
		// Reset all of our values to their initial state without a malloc/free
		SendBuffer.Reset();
	}
	else
	{
		// First time initialization needs to allocate the buffer
		SendBuffer = FBitWriter(FinalBufferSize, true);   // <---- MOD: allow resize 
	}

	ResetPacketBitCounts();

	ValidateSendBuffer();
}

But better would be to know the real cause :confused:

I resolved this almost immediately after posting this question. Sorry for the delay in posting the answer to anyone who might have been watching!

It occurred to me from reading the code that an overflow would be the result of requests piling up. In this case, the client was sending too many reliable RPCs to the server too often. The vehicles in my game predict their movement locally so they’re sending payloads to the server frequently. These do not need to be reliable RPCs.

I made a couple of changes across the entire codebase:

  1. Evaluate every RPC and ensure that only truly reliable RPCs are marked as such.
  2. Any RPC that runs frequently (like on Tick or on a frequent timer) should be unreliable and throttled.
  3. Reduce payload size, because that’s never going to hurt.

To prioritize my work I used the network profiler (Network Profiler | Unreal Engine Documentation) to find the biggest offenders. Conveniently, you can run this on both the client and the server.

Regarding (2), for any RPCs previously being sent on Tick or almost as often, I took two main approaches:

  1. Enforce some cooldown period between
    sends because coupling the network
    sendrate to the game’s tick rate is
    not necessary. This is for RPCs that
    must be sent all the time.
  2. Only send Server/Multicast/Client RPCs when
    the object is truly relevant. For
    example, a flag can broadcast its
    state less frequently when at its
    stand than when it’s bouncing around
    as a projectile in the field.

I think that’s everything. I haven’t seen this issue now in over 100 hours of soak tests with live players and/or bots on a linux dedicated server, running both locally and on cloud VMs.

1 Like

@djchase Have a look at my answer to see if that helps you discover the root cause.

@piicone Good that you have found a solution.

I had some issues with a lot of unnecessary RPCs as well.

But I wonder why the engine would exit with an error and not try to increase “some buffer” and fire a warning instead, so that this bottleneck could be eased from two sides.

When you for example would only occasionally or rarely hit the limit, the game would at least continue.