HI there,
When running 2 Machines with multi process multi-gpu configurations with any sync mode None, Nvidia etc set in the switchboard every few seconds there is a 200ms hitch in the DIsplayClusterSync, usually in one of these functions.
FDisplayClusterClusterSyncClient::GetObjectsData FDisplayClusterClusterSyncClient::GetEventsData FDisplayClusterClusterSyncClient::WaitForFrameStart
Or any function with
TRACE_CPUPROFILER_EVENT_SCOPE(CLN_CS::WaitForFrameStart); Response = SendRecvPacket(Request);
When a single Render mahcine is used, using multi-GPU Multi-process it runs fine, adding the second render node seems to cause the problems.
The render machines have 2 networks and 1 GB/s Control and a 10 GB/s Project network.
I have however tried an isolation test with just the 10 GB/s network conencted only, same issue.
I have tried launching with different sync options set in switchbaord of None / nvidia / ethernet etc It does not seem to make a difference.
Are there any thoughts of what could be going wrong or things i should eliminate?
Attached are the 2 render node insights traces _00 and _01 and the editor trace.
I have also attached a copy of the project that reproduces the issues.
And DxDiag for the 2 render nodes.
Some Machine Stats:
Nvidia: Driver Version: 573.24 Studio
GPUS: RTX 6000 Ada
in Multi Process Mode,
Sync Cards: Quadro Sync 2 taking in genlock signal with a cat 7 ethernet cable to connect the 2 snc cards (Have tried changing the ethernet cable also)
Latest Quadro Firmware
Network NIC: Intel X710 10 GBs
Render Node 1:
3 x GPU’s but only 2 are used.
Render Node 2:
1 x GPU and is used
Unreal Version is 5.6
Windows 11
Thread Ripper Pro 7975WX 32 Core
256 GB Ram
ASUS wrx90e - sage SE Motherboard.
Thanks,