How is nDisplay (Master Node) handling packetloss?

i’m currently debugging my nDisplay Cluster ( 1 Master PC with 4 Slave PC +4 Projectors) and seem running into unrelated spikes of FPS drops. (Always 300ms)

First i checked via “stat StartFile” in Session Frontend and Unreal Insights the performance. Average around 4ms for one Frame on all PCs. Checking the Spikes all calculating Threads still averaging around 4ms. Just Frame->GameThread->Self 300ms. I hope i didnt miss a thing here but it seems for me its unrelated to the Game & Render Thread.

Next thing i checked was the communication via wireshark. Comparing the timing of the spikes and wiresharks data it looks like the spikes always occur whenever a TCP Packet gets lost until it gets retransmitted.

So for my current understanding nDisplay communication works like that:
All Slave Nodes request things from Master Node (dt, Frame Time, Frame Start, Frame End, Sync Groups,…)
Master Node responses

In my case a TCP Packet of a slave node request isnt arriving to the Master node.
Master Node is waiting until the retransmitted request arrives, until then it blocks the whole game (spike 300ms). Master Node doesnt respont to any Slave Node and also no other Slave Node sends a new request.
After the retransmitted tcp arrives the game continues normally.

my config swap sync policy is set to 0 (no Sync) but even with sync (which caps my FPS to 60) it still occurs and is visible.


Did you or anyone figure this thing out? We’ve been having this issue for years, and so far the only lead we have is it seems to correlate with the overall load on the network. But we’ve tried three different switches, and we still have the issue on our two clusters.