DemoNetDriver: TickDemoRecordFrame not called during Checkpoint Saving, ultimately resulting in missing network traffic

Hello! We’re using the DemoNetDriver to record network traffic on the Server, to later be used on the client in order to facilitate Killcam playback of the last few seconds before a client player died in-game. We’re doing this with a modified version of InMemoryNetworkReplayStreamer. However since we noticed that the checkpoint saving is relatively expensive (>40ms) we’ve tweaked some cvars to spread it over multiple frames. You can see which settings we’ve modified below:

demo.QueueCheckpointChannels=0 demo.JumpToEndOfLiveReplay=0 demo.CheckpointUploadDelayInSeconds=10 demo.CheckpointSaveMaxMSPerFrameOverride=3 demo.RecordHz=30 demo.RecordUnicastRPCs=1

CheckpointUploadDelayInSeconds was decreased in order to reduce the amount of data needed for each killcam, and CheckpointSaveMaxMSPerFrameOverride was changed to amortize the checkpoint cost.

One issue with amortizing the checkpoint cost over several frames, is that the DemoNetDriver stops calling TickDemoRecordFrame in UDemoNetDriver::TickDemoRecord while this is happening. Due to our relatively high Checkpoint frequency, in combination with our amortized Checkpoints (usually it takes around 14-16 frames to record a full checkpoint, or 0.5s) result in very frequent gaps in our recorded network data.

Ultimately this very ends up as stuttering in our killcam playback whenever the killcam period goes over a checkpoint boundary, since we’re straight up missing half a second of data from that period.

Do you have any suggestions on how we could best alleviate or solve this issue? We have a few ideas:

  • Call TickDemoRecordFrame during checkpoint saving, so far we’ve seen a few issues with our custom net serializers here. Possibly because we’re essentialy network serializing actors twice per frame now?
  • Enable and make demo.WithDeltaCheckpoints work with InMemoryNetworkReplayStreamer. The theory here is that this would make Checkpoint saving faster, potentially allowing us to avoid amotizing the cost over multiple frames, or at the very least require fewer frames per checkpoint.
  • Have Checkpoints and regular data recording use the same data, so there’s no need to serialize actors twice for networking. I.e rather than having both QueuedCheckpointPackets and QueuedDemoPackets we’d just have one array, used by both.
  • Create two replay connections on the server, using one for checkpoints and one for regular data recording. While an initial prototype didn’t immediately crash, trying to combine data from two separate connections sounds like another cans of worms.
  • Somehow optimize checkpoint saving from our end, and briefly allocating more budget to checkpoint saving so we can finish it faster and use less frames.

Hi,

Using demo.WithDeltaCheckpoints may be the easiest way to optimize your checkpoints. However, it’s worth noting that this has not seen much internal testing recently, so its possible there are some bugs with delta checkpoints we’re not aware of.

That said, I haven’t been able to find a definitive reason why TickDemoRecordFrame isn’t called when recording a checkpoint, other than to save the time spent recording the demo in order to have more time to record the checkpoint. I’m not sure why your custom serializers are having issues, but if you have the frame time for it, nothing immediately stands out as a potential issue with recording both during the tick. This is not something we’ve tried or tested ourselves though.

Thanks,

Alex

Hi Alex!

We actually managed to make our custom serializers work even when calling TickDemoRecordFrame during checkpoints (they’re heavily optimized and dependent on previous state) so it appears that might be the best path forward for us, so it might indeed only have been our own code causing issues and nothing inherent from Unreal. We’re investigating delta frames in parallel but for now the “allow recording during checkpoint generation” appears to work.

Somewhat of a tangent, but we’ve also been seeing some ensures from Unreals replication layer during replay playback. Any leads you can give us as to the cause of the below ensures? The first one is by far the most common, with the other two still seeing a few triggers per playtest:

  1. Ensure condition failed: OpenPacketId.First == INDEX_NONE && OpenPacketId.Last == INDEX_NONE [File:.\Runtime/Engine/Private/DataChannel.cpp] [Line: 980]
  2. Ensure condition failed: NetFieldExport.CompatibleChecksum != 0 [File:.\Runtime/Engine/Private/DataChannel.cpp] [Line: 5003]
  3. Ensure condition failed: Checksum != 0 [File:.\Runtime/Engine/Private/RepLayout.cpp] [Line: 3954]

Fair to say we have quite a lot of custom tech and optimizations around networking, not to mention replays themselves, so it is very likely we’ve broken some assumption the engine make. But it would help to have a little more background as to what the above ensures actually mean

Hi,

The first ensure is caused by the data channel receiving a bOpen bunch after it has already been opened locally. In your logs, you can look for the following error message in the LogNetTraffic category, which should include information as to which channel is seeing the error:

Received channel open command for channel that was already opened locally...The other two ensures are related to the property checksum. These are used by replays for determining backwards compatibility, such as checking if the property still exists and if the data type is the same, so I believe these ensures are related to some change that is breaking the backwards compatibility of your replay.

Thanks,

Alex