Lack of performance with nDisplay

On our side, we have an nDisplay configuration with 3 PCs, each rendering a FHD viewport with a 90° rotation: FV (front view), LV (left view), RV (right view). Without nDisplay, performance is satisfactory. With nDisplay, performance is at 45z on a simple level below the required 60 Hz. We are unable to achieve smooth rendering.

1/ We have captured 3 Unreal Insights traces for analysis and would appreciate your feedback regarding the various latencies. Could you share your insights? Our first analysis is:

  • VBlanks are regularly skipped, which is to be expected, but results in regular and substancious overhead. Particularly, we end up most of the time with around 4ms spent on the “skip VBlank” trace. Is this to be expected ? Can this overhead be reduced in any way ?
  • Is it possible to avoid the VSync ethernet sync while still keeping a sync for gamethread frame start/end ? While counter intuitive, it is something we would like to test out and see if a game loop sync is enough for our use case, allowing us to differ the work on frame sync optimization to later developments.
  • On a broader topic, we observe noticeable time spent on communication barriers between nodes and while it is relatively optimized for network communications, it is still a tight fit given the per-frame budget. Is there any guidelines or network parameterization that we should be aware of ?

2/ Regarding performance, could you confirm that the 90° rotation can be handled within the nDisplay cluster configuration rather than applying the rotation directly to the camera?

3/ Are there any guidelines or recommendations for optimal network settings (NIC parameters, etc.)?

4/ Note the bandwidth usedif about 1.2Mbits/s on the primary. On our side the payload of our data is 80 bytes/frame.

5/ when quitting the primary node, the secondary remains forever. I thought there was a handshake that handle the quit of each node of the cluster.

You can download here traces Insights and nDisplay configuration file: https://share.corys.fr/index.php/s/miFpp9mCFoJJkjH

[Attachment Removed]

Steps to Reproduce[Attachment Removed]

Hello Stephane, let me ask a few questions to help you out.

3 PCs, each rendering a FHD viewport with a 90° rotation: FV (front view), LV (left view), RV (right view). Without nDisplay, performance is satisfactory. With nDisplay, performance is at 45z on a simple level below the required 60 Hz.

What GPU and display hardware do you have? Are you using Quadro Sync\Quadro GPU or a consumer GeForce GPU?

VBlanks are regularly skipped, which is to be expected, but results in regular and substancious overhead. Particularly, we end up most of the time with around 4ms spent on the “skip VBlank” trace. Is this to be expected ? Can this overhead be reduced in any way ?

If your system is not genlocked and you running 3 monitors over ge force gpu, such behavior could be possible due to mismatch in monitor refresh rate phase.

Is it possible to avoid the VSync ethernet sync while still keeping a sync for gamethread frame start/end ? While counter intuitive, it is something we would like to test out and see if a game loop sync is enough for our use case, allowing us to differ the work on frame sync optimization to later developments.

No, it is not, and will result in a frame mismatch.

On a broader topic, we observe noticeable time spent on communication barriers between nodes and while it is relatively optimized for network communications, it is still a tight fit given the per-frame budget.

Barriers and subject to vsync delay if the system is not genlocked.

Try switching to the sync policy None, and it will give you the raw network performance to analyze.

Regarding performance, could you confirm that the 90° rotation can be handled within the nDisplay cluster configuration rather than applying the rotation directly to the camera?

Yes, absolutely, you can rotate the screen, and that’s it

Are there any guidelines or recommendations for optimal network settings (NIC parameters, etc.)?

I don’t recall any particular set of special params since we are not facing issues (except of cheap hardware adding latency and hitches sometimes)

Note the bandwidth usedif about 1.2Mbits/s on the primary. On our side the payload of our data is 80 bytes/frame.

This is a very small amount of data; we have systems with much higher loads.

5/When quitting the primary node, the secondary remains forever. I thought there was a handshake that handled the quiescence of each node of the cluster.

Could you please explain how you quit primary?

May I ask for traces with sync policy None, too, please?

Thank you!

[Attachment Removed]

We do not use genlock. And we use consumer GeForce.

Let me rephrase one of our question: is it possible to disable vSYNC but still synchronized the gamethread number of image ? For our need we may not require an accurate vsync beteen all the hosts of the cluster. The most important thing is to sync what’s hap^pen in the same framenumber.

Please find the trace with the sync policy set to none (empty string for the parameter “renderSyncPolicy.type” in the *.display file to be accurate)

[Attachment Removed]

Stephane, sync policy None will do the job. It will only keep GT synced.

Btw, traces archive is corrupted and I was unable to unzip it. Could you please re-upload it?

[Attachment Removed]